<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Bounded Rationality (Posts about variational calculus)</title><link>http://bjlkeng.github.io/</link><description></description><atom:link href="http://bjlkeng.github.io/categories/variational-calculus.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><lastBuildDate>Tue, 10 Mar 2026 20:54:59 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Importance Sampling and Estimating Marginal Likelihood in Variational Autoencoders</title><link>http://bjlkeng.github.io/posts/importance-sampling-and-estimating-marginal-likelihood-in-variational-autoencoders/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div&gt;&lt;p&gt;It took a while but I'm back!  This post is kind of a digression (which seems
to happen a lot) along my journey of learning more about probabilistic
generative models.  There's so much in ML that you can't help learning a lot
of random things along the way.  That's why it's interesting, right?&lt;/p&gt;
&lt;p&gt;Today's topic is &lt;em&gt;importance sampling&lt;/em&gt;.  It's a really old idea that you may
have learned in a statistics class (I didn't) but somehow is useful in deep learning,
what's old is new right?  How this is relevant to the discussion is that when
we have a large latent variable model (e.g. a variational
autoencoder), we want to be able to efficiently estimate the marginal likelihood
given data.  The marginal likelihood is kind of taken for granted in the
experiments of some VAE papers when comparing different models.  I was curious
how it was actually computed and it took me down this rabbit hole.  Turns out
it's actually pretty interesting!  As usual, I'll have a mix of background
material, examples, math and code to build some intuition around this topic.
Enjoy!&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/importance-sampling-and-estimating-marginal-likelihood-in-variational-autoencoders/"&gt;Read more…&lt;/a&gt; (22 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>autoencoders</category><category>autoregressive</category><category>CIFAR10</category><category>generative models</category><category>importance sampling</category><category>mathjax</category><category>MNIST</category><category>Monte Carlo</category><category>variational calculus</category><guid>http://bjlkeng.github.io/posts/importance-sampling-and-estimating-marginal-likelihood-in-variational-autoencoders/</guid><pubDate>Wed, 06 Feb 2019 12:20:11 GMT</pubDate></item><item><title>Variational Autoencoders with Inverse Autoregressive Flows</title><link>http://bjlkeng.github.io/posts/variational-autoencoders-with-inverse-autoregressive-flows/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div&gt;&lt;p&gt;In this post, I'm going to be describing a really cool idea about how
to improve variational autoencoders using inverse autoregressive
flows.  The main idea is that we can generate more powerful posterior
distributions compared to a more basic isotropic Gaussian by applying a
series of invertible transformations.  This, in theory, will allow
your variational autoencoder to fit better by concentrating the
stochastic samples around a closer approximation to the true
posterior.  The math works out so nicely while the results are kind of
marginal &lt;a class="footnote-reference brackets" href="http://bjlkeng.github.io/posts/variational-autoencoders-with-inverse-autoregressive-flows/#id3" id="id1"&gt;1&lt;/a&gt;.  As usual, I'll go through some intuition, some math,
and have an implementation with few experiments I ran.  Enjoy!&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/variational-autoencoders-with-inverse-autoregressive-flows/"&gt;Read more…&lt;/a&gt; (18 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>autoencoders</category><category>autoregressive</category><category>CIFAR10</category><category>generative models</category><category>Kullback-Leibler</category><category>MADE</category><category>mathjax</category><category>MNIST</category><category>variational calculus</category><guid>http://bjlkeng.github.io/posts/variational-autoencoders-with-inverse-autoregressive-flows/</guid><pubDate>Tue, 19 Dec 2017 13:47:38 GMT</pubDate></item><item><title>Semi-supervised Learning with Variational Autoencoders</title><link>http://bjlkeng.github.io/posts/semi-supervised-learning-with-variational-autoencoders/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div&gt;&lt;p&gt;In this post, I'll be continuing on this variational autoencoder (VAE) line of
exploration
(previous posts: &lt;a class="reference external" href="http://bjlkeng.github.io/posts/variational-autoencoders/"&gt;here&lt;/a&gt; and
&lt;a class="reference external" href="http://bjlkeng.github.io/posts/a-variational-autoencoder-on-the-svnh-dataset/"&gt;here&lt;/a&gt;) by
writing about how to use variational autoencoders to do semi-supervised
learning.  In particular, I'll be explaining the technique used in
"Semi-supervised Learning with Deep Generative Models" by Kingma et al.
I'll be digging into the math (hopefully being more explicit than the paper),
giving a bit more background on the variational lower bound, as well as
my usual attempt at giving some more intuition.
I've also put some notebooks on Github that compare the VAE methods
with others such as PCA, CNNs, and pre-trained models.  Enjoy!&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/semi-supervised-learning-with-variational-autoencoders/"&gt;Read more…&lt;/a&gt; (25 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>autoencoders</category><category>CIFAR10</category><category>CNN</category><category>generative models</category><category>inception</category><category>Kullback-Leibler</category><category>mathjax</category><category>PCA</category><category>semi-supervised learning</category><category>variational calculus</category><guid>http://bjlkeng.github.io/posts/semi-supervised-learning-with-variational-autoencoders/</guid><pubDate>Mon, 11 Sep 2017 12:40:47 GMT</pubDate></item><item><title>A Variational Autoencoder on the SVHN dataset</title><link>http://bjlkeng.github.io/posts/a-variational-autoencoder-on-the-svnh-dataset/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div&gt;&lt;p&gt;In this post, I'm going to share some notes on implementing a variational
autoencoder (VAE) on the
&lt;a class="reference external" href="http://ufldl.stanford.edu/housenumbers/"&gt;Street View House Numbers&lt;/a&gt;
(SVHN) dataset.  My last post on
&lt;a class="reference external" href="http://bjlkeng.github.io/posts/variational-autoencoders/"&gt;variational autoencoders&lt;/a&gt;
showed a simple example on the MNIST dataset but because it was so simple I
thought I might have missed some of the subtler points of VAEs -- boy was I
right!  The fact that I'm not really a computer vision guy nor a deep learning
guy didn't help either.  Through this exercise, I picked up some of the basics
in the "craft" of computer vision/deep learning area; there are a lot of subtle
points that are easy to gloss over if you're just reading someone else's
tutorial.  I'll share with you some of the details in the math (that I
initially got wrong) and also some of the implementation notes along with a
notebook that I used to train the VAE.  Please check out my previous post
on &lt;a class="reference external" href="http://bjlkeng.github.io/posts/variational-autoencoders/"&gt;variational autoencoders&lt;/a&gt; to
get some background.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Update 2017-08-09: I actually found a bug in my original code where I was
only using a small subset of the data!  I fixed it up in the notebooks and
I've added some inline comments below to say what I've changed.  For the most
part, things have stayed the same but the generated images are a bit blurry
because the dataset isn't so easy anymore.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/a-variational-autoencoder-on-the-svnh-dataset/"&gt;Read more…&lt;/a&gt; (19 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>autoencoders</category><category>generative models</category><category>Kullback-Leibler</category><category>mathjax</category><category>svhn</category><category>variational calculus</category><guid>http://bjlkeng.github.io/posts/a-variational-autoencoder-on-the-svnh-dataset/</guid><pubDate>Thu, 13 Jul 2017 12:13:03 GMT</pubDate></item><item><title>Variational Autoencoders</title><link>http://bjlkeng.github.io/posts/variational-autoencoders/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div&gt;&lt;p&gt;This post is going to talk about an incredibly interesting unsupervised
learning method in machine learning called variational autoencoders.  It's main
claim to fame is in building generative models of complex distributions like
handwritten digits, faces, and image segments among others.  The really cool
thing about this topic is that it has firm roots in probability but uses a
function approximator (i.e.  neural networks) to approximate an otherwise
intractable problem.  As usual, I'll try to start with some background and
motivation, include a healthy does of math, and along the way try to convey
some of the intuition of why it works.  I've also annotated a
&lt;a class="reference external" href="https://github.com/bjlkeng/sandbox/blob/master/notebooks/variational-autoencoder.ipynb"&gt;basic example&lt;/a&gt;
so you can see how the math relates to an actual implementation.  I based much
of this post on Carl Doersch's &lt;a class="reference external" href="https://arxiv.org/abs/1606.05908"&gt;tutorial&lt;/a&gt;,
which has a great explanation on this whole topic, so make sure you check that
out too.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/variational-autoencoders/"&gt;Read more…&lt;/a&gt; (25 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>autoencoders</category><category>generative models</category><category>Kullback-Leibler</category><category>mathjax</category><category>variational calculus</category><guid>http://bjlkeng.github.io/posts/variational-autoencoders/</guid><pubDate>Tue, 30 May 2017 12:19:36 GMT</pubDate></item><item><title>Variational Bayes and The Mean-Field Approximation</title><link>http://bjlkeng.github.io/posts/variational-bayes-and-the-mean-field-approximation/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div&gt;&lt;p&gt;This post is going to cover Variational Bayesian methods and, in particular,
the most common one, the mean-field approximation.  This is a topic that I've
been trying to understand for a while now but didn't quite have all the background
that I needed.  After picking up the main ideas from
&lt;a class="reference external" href="http://bjlkeng.github.io/posts/the-calculus-of-variations/"&gt;variational calculus&lt;/a&gt; and
getting more fluent in manipulating probability statements like
in my &lt;a class="reference external" href="http://bjlkeng.github.io/posts/the-expectation-maximization-algorithm/"&gt;EM&lt;/a&gt; post,
this variational Bayes stuff seems a lot easier.&lt;/p&gt;
&lt;p&gt;Variational Bayesian methods are a set of techniques to approximate posterior
distributions in &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Bayesian_inference"&gt;Bayesian Inference&lt;/a&gt;.
If this sounds a bit terse, keep reading!  I hope to provide some intuition
so that the big ideas are easy to understand (which they are), but of course we
can't do that well unless we have a healthy dose of mathematics.  For some of the
background concepts, I'll try to refer you to good sources (including my own),
which I find is the main blocker to understanding this subject (admittedly, the
math can sometimes be a bit cryptic too).  Enjoy!&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/variational-bayes-and-the-mean-field-approximation/"&gt;Read more…&lt;/a&gt; (24 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>Bayesian</category><category>Kullback-Leibler</category><category>mathjax</category><category>mean-field</category><category>variational calculus</category><guid>http://bjlkeng.github.io/posts/variational-bayes-and-the-mean-field-approximation/</guid><pubDate>Mon, 03 Apr 2017 13:02:46 GMT</pubDate></item><item><title>The Calculus of Variations</title><link>http://bjlkeng.github.io/posts/the-calculus-of-variations/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div&gt;&lt;p&gt;This post is going to describe a specialized type of calculus called
variational calculus.
Analogous to the usual methods of calculus that we learn in university,
this one deals with functions &lt;em&gt;of functions&lt;/em&gt; and how to
minimize or maximize them.  It's used extensively in physics problems such as
finding the minimum energy path a particle takes under certain conditions.  As
you can also imagine, it's also used in machine learning/statistics where you
want to find a density that optimizes an objective &lt;a class="footnote-reference brackets" href="http://bjlkeng.github.io/posts/the-calculus-of-variations/#id4" id="id1"&gt;1&lt;/a&gt;.  The explanation I'm
going to use (at least for the first part) is heavily based upon Svetitsky's
&lt;a class="reference external" href="http://julian.tau.ac.il/bqs/functionals/functionals.html"&gt;Notes on Functionals&lt;/a&gt;, which so far is
the most intuitive explanation I've read.  I'll try to follow Svetitsky's
notes to give some intuition on how we arrive at variational calculus from
regular calculus with a bunch of examples along the way.  Eventually we'll
get to an application that relates back to probability.  I think with the right
intuition and explanation, it's actually not too difficult, enjoy!&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/the-calculus-of-variations/"&gt;Read more…&lt;/a&gt; (16 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>differentials</category><category>entropy</category><category>lagrange multipliers</category><category>mathjax</category><category>probability</category><category>variational calculus</category><guid>http://bjlkeng.github.io/posts/the-calculus-of-variations/</guid><pubDate>Sun, 26 Feb 2017 15:08:38 GMT</pubDate></item></channel></rss>