It took a while but I'm back! This post is kind of a digression (which seems to happen a lot) along my journey of learning more about probabilistic generative models. There's so much in ML that you can't help learning a lot of random things along the way. That's why it's interesting, right?
Today's topic is importance sampling. It's a really old idea that you may have learned in a statistics class (I didn't) but somehow is useful in deep learning, what's old is new right? How this is relevant to the discussion is that when we have a large latent variable model (e.g. a variational autoencoder), we want to be able to efficiently estimate the marginal likelihood given data. The marginal likelihood is kind of taken for granted in the experiments of some VAE papers when comparing different models. I was curious how it was actually computed and it took me down this rabbit hole. Turns out it's actually pretty interesting! As usual, I'll have a mix of background material, examples, math and code to build some intuition around this topic. Enjoy!