Bounded Rationality (Posts about MNIST)http://bjlkeng.github.io/enSat, 03 Aug 2024 01:42:49 GMTNikola (getnikola.com)http://blogs.law.harvard.edu/tech/rssNormalizing Flows with Real NVPhttp://bjlkeng.github.io/posts/normalizing-flows-with-real-nvp/Brian Keng<div><p>This post has been a long time coming. I originally started working on it several posts back but
hit a roadblock in the implementation and then got distracted with some other ideas, which took
me down various rabbit holes (<a class="reference external" href="http://bjlkeng.github.io/posts/hamiltonian-monte-carlo/">here</a>,
<a class="reference external" href="http://bjlkeng.github.io/posts/lossless-compression-with-asymmetric-numeral-systems/">here</a>, and
<a class="reference external" href="http://bjlkeng.github.io/posts/lossless-compression-with-latent-variable-models-using-bits-back-coding/">here</a>).
It feels good to finally get back on track to some core ML topics.
The other nice thing about not being an academic researcher (not that I'm
really researching anything here) is that there is no pressure to do anything!
If it's just for fun, you can take your time with a topic, veer off track, and
the come back to it later. It's nice having the freedom to do what you want (this applies to
more than just learning about ML too)!</p>
<p>This post is going to talk about a class of deep probabilistic generative
models called normalizing flows. Alongside <a class="reference external" href="http://bjlkeng.github.io/posts/variational-autoencoders/">Variational Autoencoders</a>
and autoregressive models <a class="footnote-reference brackets" href="http://bjlkeng.github.io/posts/normalizing-flows-with-real-nvp/#id3" id="id1">1</a> (e.g. <a class="reference external" href="http://bjlkeng.github.io/posts/pixelcnn/">Pixel CNN</a> and
<a class="reference external" href="http://bjlkeng.github.io/posts/autoregressive-autoencoders/">Autoregressive autoencoders</a>),
normalizing flows have been one of the big ideas in deep probabilistic generative models (I don't count GANs because they are not quite probabilistic).
Specifically, I'll be presenting one of the earlier normalizing flow
techniques named <em>Real NVP</em> (circa 2016).
The formulation is simple but surprisingly effective, which makes it a good
candidate to understand more about normalizing flows.
As usual, I'll go over some background, the method, an implementation
(with commentary on the details), and some experimental results. Let's get into the flow!</p>
<p><a href="http://bjlkeng.github.io/posts/normalizing-flows-with-real-nvp/">Read more…</a> (32 min remaining to read)</p></div>CELEBACIFAR10generative modelsmathjaxMNISTnormalizing flowshttp://bjlkeng.github.io/posts/normalizing-flows-with-real-nvp/Sat, 23 Apr 2022 23:36:05 GMTLossless Compression with Latent Variable Models using Bits-Back Codinghttp://bjlkeng.github.io/posts/lossless-compression-with-latent-variable-models-using-bits-back-coding/Brian Keng<div><p>A lot of modern machine learning is related to this idea of "compression", or
maybe to use a fancier term "representations". Taking a huge dimensional space
(e.g. images of 256 x 256 x 3 pixels = 196608 dimensions) and somehow compressing it into
a 1000 or so dimensional representation seems like pretty good compression to
me! Unfortunately, it's not a lossless compression (or representation).
Somehow though, it seems intuitive that there must be a way to use what is learned in
these powerful lossy representations to help us better perform <em>lossless</em>
compression, right? Of course there is! (It would be too anti-climatic of a
setup otherwise.)</p>
<p>This post is going to introduce a method to perform lossless compression that
leverages the learned "compression" of a machine learning latent variable
model using the Bits-Back coding algorithm. Depending on how you first think
about it, this <em>seems</em> like it should either be (a) really easy or (b) not possible at
all. The reality is kind of in between with an elegant theoretical algorithm
that is brought down by the realities of discretization and imperfect learning
by the model. In today's post, I'll skim over some preliminaries (mostly
referring you to previous posts), go over the main Bits-Back coding algorithm
in detail, and discuss some of the implementation details and experiments that
I did while trying to write a toy version of the algorithm.</p>
<p><a href="http://bjlkeng.github.io/posts/lossless-compression-with-latent-variable-models-using-bits-back-coding/">Read more…</a> (25 min remaining to read)</p></div>asymmetric numeral systemsBits-BackcompressionlosslessmathjaxMNISTvariational autoencoderhttp://bjlkeng.github.io/posts/lossless-compression-with-latent-variable-models-using-bits-back-coding/Tue, 06 Jul 2021 16:00:00 GMTImportance Sampling and Estimating Marginal Likelihood in Variational Autoencodershttp://bjlkeng.github.io/posts/importance-sampling-and-estimating-marginal-likelihood-in-variational-autoencoders/Brian Keng<div><p>It took a while but I'm back! This post is kind of a digression (which seems
to happen a lot) along my journey of learning more about probabilistic
generative models. There's so much in ML that you can't help learning a lot
of random things along the way. That's why it's interesting, right?</p>
<p>Today's topic is <em>importance sampling</em>. It's a really old idea that you may
have learned in a statistics class (I didn't) but somehow is useful in deep learning,
what's old is new right? How this is relevant to the discussion is that when
we have a large latent variable model (e.g. a variational
autoencoder), we want to be able to efficiently estimate the marginal likelihood
given data. The marginal likelihood is kind of taken for granted in the
experiments of some VAE papers when comparing different models. I was curious
how it was actually computed and it took me down this rabbit hole. Turns out
it's actually pretty interesting! As usual, I'll have a mix of background
material, examples, math and code to build some intuition around this topic.
Enjoy!</p>
<p><a href="http://bjlkeng.github.io/posts/importance-sampling-and-estimating-marginal-likelihood-in-variational-autoencoders/">Read more…</a> (22 min remaining to read)</p></div>autoencodersautoregressiveCIFAR10generative modelsimportance samplingmathjaxMNISTMonte Carlovariational calculushttp://bjlkeng.github.io/posts/importance-sampling-and-estimating-marginal-likelihood-in-variational-autoencoders/Wed, 06 Feb 2019 12:20:11 GMTVariational Autoencoders with Inverse Autoregressive Flowshttp://bjlkeng.github.io/posts/variational-autoencoders-with-inverse-autoregressive-flows/Brian Keng<div><p>In this post, I'm going to be describing a really cool idea about how
to improve variational autoencoders using inverse autoregressive
flows. The main idea is that we can generate more powerful posterior
distributions compared to a more basic isotropic Gaussian by applying a
series of invertible transformations. This, in theory, will allow
your variational autoencoder to fit better by concentrating the
stochastic samples around a closer approximation to the true
posterior. The math works out so nicely while the results are kind of
marginal <a class="footnote-reference brackets" href="http://bjlkeng.github.io/posts/variational-autoencoders-with-inverse-autoregressive-flows/#id3" id="id1">1</a>. As usual, I'll go through some intuition, some math,
and have an implementation with few experiments I ran. Enjoy!</p>
<p><a href="http://bjlkeng.github.io/posts/variational-autoencoders-with-inverse-autoregressive-flows/">Read more…</a> (18 min remaining to read)</p></div>autoencodersautoregressiveCIFAR10generative modelsKullback-LeiblerMADEmathjaxMNISTvariational calculushttp://bjlkeng.github.io/posts/variational-autoencoders-with-inverse-autoregressive-flows/Tue, 19 Dec 2017 13:47:38 GMTAutoregressive Autoencodershttp://bjlkeng.github.io/posts/autoregressive-autoencoders/Brian Keng<div><p>You might think that I'd be bored with autoencoders by now but I still
find them extremely interesting! In this post, I'm going to be explaining
a cute little idea that I came across in the paper <a class="reference external" href="https://arxiv.org/pdf/1502.03509.pdf">MADE: Masked Autoencoder
for Distribution Estimation</a>.
Traditional autoencoders are great because they can perform unsupervised
learning by mapping an input to a latent representation. However, one
drawback is that they don't have a solid probabilistic basis
(of course there are other variants of autoencoders that do, see previous posts
<a class="reference external" href="http://bjlkeng.github.io/posts/variational-autoencoders/">here</a>,
<a class="reference external" href="http://bjlkeng.github.io/posts/a-variational-autoencoder-on-the-svnh-dataset/">here</a>, and
<a class="reference external" href="http://bjlkeng.github.io/posts/semi-supervised-learning-with-variational-autoencoders/">here</a>).
By using what the authors define as the <em>autoregressive property</em>, we can
transform the traditional autoencoder approach into a fully probabilistic model
with very little modification! As usual, I'll provide some intuition, math and
an implementation.</p>
<p><a href="http://bjlkeng.github.io/posts/autoregressive-autoencoders/">Read more…</a> (17 min remaining to read)</p></div>autoencodersautoregressivegenerative modelsMADEmathjaxMNISThttp://bjlkeng.github.io/posts/autoregressive-autoencoders/Sat, 14 Oct 2017 14:02:15 GMT