<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Bounded Rationality (Posts about Bayesian)</title><link>http://bjlkeng.github.io/</link><description></description><atom:link href="http://bjlkeng.github.io/categories/bayesian.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><lastBuildDate>Tue, 10 Mar 2026 20:54:58 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Bayesian Learning via Stochastic Gradient Langevin Dynamics and Bayes by Backprop</title><link>http://bjlkeng.github.io/posts/bayesian-learning-via-stochastic-gradient-langevin-dynamics-and-bayes-by-backprop/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div&gt;&lt;p&gt;After a long digression, I'm finally back to one of the main lines of research
that I wanted to write about.  The two main ideas in this post are not that
recent but have been quite impactful (one of the
&lt;a class="reference external" href="https://icml.cc/virtual/2021/test-of-time/11808"&gt;papers&lt;/a&gt; won a recent ICML
test of time award).  They address two of the topics that are near and dear to
my heart: Bayesian learning and scalability.  Dare I even ask who wouldn't be
interested in the intersection of these topics?&lt;/p&gt;
&lt;p&gt;This post is about two techniques to perform scalable Bayesian inference.  They
both address the problem using stochastic gradient descent (SGD) but in very
different ways.  One leverages the observation that SGD plus some noise will
converge to Bayesian posterior sampling &lt;a class="citation-reference" href="http://bjlkeng.github.io/posts/bayesian-learning-via-stochastic-gradient-langevin-dynamics-and-bayes-by-backprop/#welling2011" id="id1"&gt;[Welling2011]&lt;/a&gt;, while the other generalizes the
"reparameterization trick" from variational autoencoders to enable non-Gaussian
posterior approximations &lt;a class="citation-reference" href="http://bjlkeng.github.io/posts/bayesian-learning-via-stochastic-gradient-langevin-dynamics-and-bayes-by-backprop/#blundell2015" id="id2"&gt;[Blundell2015]&lt;/a&gt;.  Both are easily implemented in the modern deep
learning toolkit thus benefit from the massive scalability of that toolchain.
As usual, I will go over the necessary background (or refer you to my previous
posts), intuition, some math, and a couple of toy examples that I implemented.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/bayesian-learning-via-stochastic-gradient-langevin-dynamics-and-bayes-by-backprop/"&gt;Read more…&lt;/a&gt; (53 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>Bayes by Backprop</category><category>Bayesian</category><category>elbo</category><category>HMC</category><category>Langevin</category><category>mathjax</category><category>rmsprop</category><category>sgd</category><category>SGLD</category><category>variational inference</category><guid>http://bjlkeng.github.io/posts/bayesian-learning-via-stochastic-gradient-langevin-dynamics-and-bayes-by-backprop/</guid><pubDate>Wed, 08 Feb 2023 23:25:40 GMT</pubDate></item><item><title>Hamiltonian Monte Carlo</title><link>http://bjlkeng.github.io/posts/hamiltonian-monte-carlo/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div&gt;&lt;p&gt;Here's a topic I thought that I would never get around to learning because it was "too hard".
When I first started learning about Bayesian methods, I knew enough that I
should learn a thing or two about MCMC since that's the backbone
of most Bayesian analysis; so I learned something about it
(see my &lt;a class="reference external" href="http://bjlkeng.github.io/posts/markov-chain-monte-carlo-mcmc-and-the-metropolis-hastings-algorithm/"&gt;previous post&lt;/a&gt;).
But I didn't dare attempt to learn about the infamous Hamiltonian Monte Carlo (HMC).
Even though it is among the standard algorithms used in Bayesian inference, it
always seemed too daunting because it required "advanced physics" to
understand.  As usual, things only seem hard because you don't know them yet.
After having some time to digest MCMC methods, getting comfortable learning
more maths (see
&lt;a class="reference external" href="http://bjlkeng.github.io/posts/tensors-tensors-tensors/"&gt;here&lt;/a&gt;,
&lt;a class="reference external" href="http://bjlkeng.github.io/posts/manifolds/"&gt;here&lt;/a&gt;, and
&lt;a class="reference external" href="http://bjlkeng.github.io/posts/hyperbolic-geometry-and-poincare-embeddings/"&gt;here&lt;/a&gt;),
all of a sudden learning "advanced physics" didn't seem so tough (but there
sure was a lot of background needed)!&lt;/p&gt;
&lt;p&gt;This post is the culmination of many different rabbit holes (many much deeper
than I needed to go) where I'm going to attempt to explain HMC in simple and
intuitive terms to a satisfactory degree (that's the tag line of this blog
after all).  I'm going to begin by briefly motivating the topic by reviewing
MCMC and the Metropolis-Hastings algorithm then move on to explaining
Hamiltonian dynamics (i.e., the "advanced physics"), and finally discuss the HMC
algorithm along with some toy experiments I put together.  Most of the material
is based on [1] and [2], which I've found to be great sources for their
respective areas.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/hamiltonian-monte-carlo/"&gt;Read more…&lt;/a&gt; (52 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>Bayesian</category><category>Hamiltonian</category><category>mathjax</category><category>MCMC</category><category>Monte Carlo</category><guid>http://bjlkeng.github.io/posts/hamiltonian-monte-carlo/</guid><pubDate>Fri, 24 Dec 2021 00:07:05 GMT</pubDate></item><item><title>Variational Bayes and The Mean-Field Approximation</title><link>http://bjlkeng.github.io/posts/variational-bayes-and-the-mean-field-approximation/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div&gt;&lt;p&gt;This post is going to cover Variational Bayesian methods and, in particular,
the most common one, the mean-field approximation.  This is a topic that I've
been trying to understand for a while now but didn't quite have all the background
that I needed.  After picking up the main ideas from
&lt;a class="reference external" href="http://bjlkeng.github.io/posts/the-calculus-of-variations/"&gt;variational calculus&lt;/a&gt; and
getting more fluent in manipulating probability statements like
in my &lt;a class="reference external" href="http://bjlkeng.github.io/posts/the-expectation-maximization-algorithm/"&gt;EM&lt;/a&gt; post,
this variational Bayes stuff seems a lot easier.&lt;/p&gt;
&lt;p&gt;Variational Bayesian methods are a set of techniques to approximate posterior
distributions in &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Bayesian_inference"&gt;Bayesian Inference&lt;/a&gt;.
If this sounds a bit terse, keep reading!  I hope to provide some intuition
so that the big ideas are easy to understand (which they are), but of course we
can't do that well unless we have a healthy dose of mathematics.  For some of the
background concepts, I'll try to refer you to good sources (including my own),
which I find is the main blocker to understanding this subject (admittedly, the
math can sometimes be a bit cryptic too).  Enjoy!&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/variational-bayes-and-the-mean-field-approximation/"&gt;Read more…&lt;/a&gt; (24 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>Bayesian</category><category>Kullback-Leibler</category><category>mathjax</category><category>mean-field</category><category>variational calculus</category><guid>http://bjlkeng.github.io/posts/variational-bayes-and-the-mean-field-approximation/</guid><pubDate>Mon, 03 Apr 2017 13:02:46 GMT</pubDate></item><item><title>A Probabilistic Interpretation of Regularization</title><link>http://bjlkeng.github.io/posts/probabilistic-interpretation-of-regularization/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div&gt;&lt;p&gt;This post is going to look at a probabilistic (Bayesian) interpretation of
regularization.  We'll take a look at both L1 and L2 regularization in the
context of ordinary linear regression.  The discussion will start off
with a quick introduction to regularization, followed by a back-to-basics
explanation starting with the maximum likelihood estimate (MLE), then on to the
maximum a posteriori estimate (MAP), and finally playing around with priors to
end up with L1 and L2 regularization.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/probabilistic-interpretation-of-regularization/"&gt;Read more…&lt;/a&gt; (9 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>Bayesian</category><category>mathjax</category><category>probability</category><category>regularization</category><guid>http://bjlkeng.github.io/posts/probabilistic-interpretation-of-regularization/</guid><pubDate>Mon, 29 Aug 2016 12:52:33 GMT</pubDate></item><item><title>A Probabilistic View of Linear Regression</title><link>http://bjlkeng.github.io/posts/a-probabilistic-view-of-regression/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div&gt;&lt;p&gt;One thing that I always disliked about introductory material to linear
regression is how randomness is explained.  The explanations always
seemed unintuitive because, as I have frequently seen it, they appear as an
after thought rather than the central focus of the model.
In this post, I'm going to try to
take another approach to building an ordinary linear regression model starting
from a probabilistic point of view (which is pretty much just a Bayesian view).
After the general idea is established, I'll modify the model a bit and end up
with a Poisson regression using the exact same principles showing how
generalized linear models aren't any more complicated.  Hopefully, this will
help explain the "randomness" in linear regression in a more intuitive way.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/a-probabilistic-view-of-regression/"&gt;Read more…&lt;/a&gt; (12 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>Bayesian</category><category>logistic</category><category>mathjax</category><category>Poisson</category><category>probability</category><category>regression</category><guid>http://bjlkeng.github.io/posts/a-probabilistic-view-of-regression/</guid><pubDate>Sun, 15 May 2016 00:43:05 GMT</pubDate></item><item><title>Normal Approximation to the Posterior Distribution</title><link>http://bjlkeng.github.io/posts/normal-approximations-to-the-posterior-distribution/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div class="cell border-box-sizing text_cell rendered"&gt;&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;In this post, I'm going to write about how the ever versatile normal distribution can be used to approximate a Bayesian posterior distribution.  Unlike some other normal approximations, this is &lt;em&gt;not&lt;/em&gt; a direct application of the central limit theorem.  The result has a straight forward proof using Laplace's Method whose main ideas I will attempt to present.  I'll also simulate a simple scenario to see how it works in practice.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/normal-approximations-to-the-posterior-distribution/"&gt;Read more…&lt;/a&gt; (14 min remaining to read)&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;</description><category>Bayesian</category><category>normal distribution</category><category>posterior</category><category>prior</category><category>probability</category><category>sampling</category><guid>http://bjlkeng.github.io/posts/normal-approximations-to-the-posterior-distribution/</guid><pubDate>Sat, 02 Apr 2016 19:22:54 GMT</pubDate></item></channel></rss>