# The Logic Behind the Maximum Entropy Principle

For a while now, I've really enjoyed diving deep to understand probability and related fundamentals (see here, here, and here). Entropy is a topic that comes up all over the place from physics to information theory, and of course, machine learning. I written about it in various different forms but always taken it as a given as the "expected information". Well I found a few of good explanations about how to "derive" it and thought that I should share.

In this post, I'll be showing a few of derivations of the maximum entropy principle, where entropy appears as part of the definition. These derivations will show why it is a reasonable and natural thing to maximize, and how it is determined from some well thought out reasoning. This post will be more math heavy but hopefully it will give you more insight into this wonderfully surprising topic.

# Iterative Summarization using LLMs

After being busy for the first part of the year, I finally have a bit of time to work on this blog. After a lot of thinking about how to best fit it into my schedule, I've decided to attempt to write shorter posts. Although I do get a lot of satisfaction writing long posts, it's not practical because of the time commitment. Better to break it up into smaller parts to be able to "ship" often rather than perfect each post. This also allows me to experiment with smaller scoped topics, which hopefully will keep more more motivated as well. Speaking of which...

This post is about answering a random thought I had the other day: what would happen if I kept passing an LLM's output back to itself? I ran a few experiments of trying to get the LLM to iteratively summarize or rephrase a piece of text and the results are... pretty much what you would expect. But if you don't know what to expect, then read on and find out what happened!

# A Look at The First Place Solution of a Dermatology Classification Kaggle Competition

One interesting thing I often think about is the gap between academic and real-world solutions. In general academic solutions play in the realm of idealized problem spaces, removing themselves from needing to care about the messiness of the real-world. Kaggle competitions are a (small) step in the right direction towards dealing with messiness, usually providing a true blind test set (vs. overused benchmarks), and opening a few degrees of freedom in terms the techniques that can be used, which usually eschews novelty in favour of more robust methods. To this end, I thought it would be useful to take a look at a more realistic problem (via a Kaggle competition) and understand the practical details that result in a superior solution.

This post will cover the first place solution [1] to the SIIM-ISIC Melanoma Classification [0] challenge. In addition to using tried and true architectures (mostly EfficientNets), they have some interesting tactics they use to formulate the problem, process the data, and train/validate the model. I'll cover background on the ML techniques, competition and data, architectural details, problem formulation, and implementation. I've also run some experiments to better understand the benefits of certain choices they made. Enjoy!

# LLM Fun: Building a Q&A Bot of Myself

Unless you've been living under a rock, you've probably heard of large language models (LLM) such as ChatGPT or Bard. I'm not one for riding a hype train but I do think LLMs are here to stay and either are going to have an impact as big as mobile as an interface (my current best guess) or perhaps something as big as the Internet itself. In either case, it behooves me to do a bit more investigation into this popular trend 1. At the same time, there are a bunch of other developer technologies that I've been wondering about like serverless computing, modern dev tools, and LLM-based code assistants, so I thought why not kill multiple birds with one stone.

This post is going to describe how I built a question and answering bot of myself using LLMs as well as my experience using the relevant developer tools such as ChatGPT, Github Copilot, Cloudflare workers, and a couple of other related ones. I start out with my motivation for doing this project, some brief background on the technologies, a description of how I built everything including some evaluation on LLM outputs, and finally some commentary. This post is a lot less heavy on the math as compared to my previous ones but it still has some good stuff so read on!

# Bayesian Learning via Stochastic Gradient Langevin Dynamics and Bayes by Backprop

After a long digression, I'm finally back to one of the main lines of research that I wanted to write about. The two main ideas in this post are not that recent but have been quite impactful (one of the papers won a recent ICML test of time award). They address two of the topics that are near and dear to my heart: Bayesian learning and scalability. Dare I even ask who wouldn't be interested in the intersection of these topics?

This post is about two techniques to perform scalable Bayesian inference. They both address the problem using stochastic gradient descent (SGD) but in very different ways. One leverages the observation that SGD plus some noise will converge to Bayesian posterior sampling [Welling2011], while the other generalizes the "reparameterization trick" from variational autoencoders to enable non-Gaussian posterior approximations [Blundell2015]. Both are easily implemented in the modern deep learning toolkit thus benefit from the massive scalability of that toolchain. As usual, I will go over the necessary background (or refer you to my previous posts), intuition, some math, and a couple of toy examples that I implemented.