Iterative Summarization using LLMs

After being busy for the first part of the year, I finally have a bit of time to work on this blog. After a lot of thinking about how to best fit it into my schedule, I've decided to attempt to write shorter posts. Although I do get a lot of satisfaction writing long posts, it's not practical because of the time commitment. Better to break it up into smaller parts to be able to "ship" often rather than perfect each post. This also allows me to experiment with smaller scoped topics, which hopefully will keep more more motivated as well. Speaking of which...

This post is about answering a random thought I had the other day: what would happen if I kept passing an LLM's output back to itself? I ran a few experiments of trying to get the LLM to iteratively summarize or rephrase a piece of text and the results are... pretty much what you would expect. But if you don't know what to expect, then read on and find out what happened!

Read more…

A Look at The First Place Solution of a Dermatology Classification Kaggle Competition

One interesting thing I often think about is the gap between academic and real-world solutions. In general academic solutions play in the realm of idealized problem spaces, removing themselves from needing to care about the messiness of the real-world. Kaggle competitions are a (small) step in the right direction towards dealing with messiness, usually providing a true blind test set (vs. overused benchmarks), and opening a few degrees of freedom in terms the techniques that can be used, which usually eschews novelty in favour of more robust methods. To this end, I thought it would be useful to take a look at a more realistic problem (via a Kaggle competition) and understand the practical details that result in a superior solution.

This post will cover the first place solution [1] to the SIIM-ISIC Melanoma Classification [0] challenge. In addition to using tried and true architectures (mostly EfficientNets), they have some interesting tactics they use to formulate the problem, process the data, and train/validate the model. I'll cover background on the ML techniques, competition and data, architectural details, problem formulation, and implementation. I've also run some experiments to better understand the benefits of certain choices they made. Enjoy!

Read more…

LLM Fun: Building a Q&A Bot of Myself

Unless you've been living under a rock, you've probably heard of large language models (LLM) such as ChatGPT or Bard. I'm not one for riding a hype train but I do think LLMs are here to stay and either are going to have an impact as big as mobile as an interface (my current best guess) or perhaps something as big as the Internet itself. In either case, it behooves me to do a bit more investigation into this popular trend 1. At the same time, there are a bunch of other developer technologies that I've been wondering about like serverless computing, modern dev tools, and LLM-based code assistants, so I thought why not kill multiple birds with one stone.

This post is going to describe how I built a question and answering bot of myself using LLMs as well as my experience using the relevant developer tools such as ChatGPT, Github Copilot, Cloudflare workers, and a couple of other related ones. I start out with my motivation for doing this project, some brief background on the technologies, a description of how I built everything including some evaluation on LLM outputs, and finally some commentary. This post is a lot less heavy on the math as compared to my previous ones but it still has some good stuff so read on!

Read more…

Bayesian Learning via Stochastic Gradient Langevin Dynamics and Bayes by Backprop

After a long digression, I'm finally back to one of the main lines of research that I wanted to write about. The two main ideas in this post are not that recent but have been quite impactful (one of the papers won a recent ICML test of time award). They address two of the topics that are near and dear to my heart: Bayesian learning and scalability. Dare I even ask who wouldn't be interested in the intersection of these topics?

This post is about two techniques to perform scalable Bayesian inference. They both address the problem using stochastic gradient descent (SGD) but in very different ways. One leverages the observation that SGD plus some noise will converge to Bayesian posterior sampling [Welling2011], while the other generalizes the "reparameterization trick" from variational autoencoders to enable non-Gaussian posterior approximations [Blundell2015]. Both are easily implemented in the modern deep learning toolkit thus benefit from the massive scalability of that toolchain. As usual, I will go over the necessary background (or refer you to my previous posts), intuition, some math, and a couple of toy examples that I implemented.

Read more…

An Introduction to Stochastic Calculus

Through a couple of different avenues I wandered, yet again, down a rabbit hole leading to the topic of this post. The first avenue was through my main focus on a particular machine learning topic that utilized some concepts from physics, which naturally led me to stochastic calculus. The second avenue was through some projects at work in the quantitative finance space, which is one of the main applications of stochastic calculus. Naively, I thought I could write a brief post on it that would satisfy my curiosity -- that didn't work out at all! The result is this extra long post.

This post is about stochastic calculus, an extension of regular calculus to stochastic processes. It's not immediately obvious but the rigour needed to properly understand some of the key ideas requires going back to the measure theoretic definition of probability theory, so that's where I start in the background. From there I quickly move on to stochastic processes, the Wiener process, a particular flavour of stochastic calculus called Itô calculus, and finally end with a couple of applications. As usual, I try to include a mix of intuition, rigour where it helps intuition, and some simple examples. It's a deep and wide topic so I hope you enjoy my digest of it.

Read more…

Hi, I'm Brian Keng. This is the place where I write about all things technical.

Twitter: @bjlkeng



Signup for Email Blog Posts