Bounded Rationality (Posts about probability)

An Introduction to Stochastic Calculus

Brian Keng — Mon, 12 Sep 2022 01:05:55 GMT

Through a couple of different avenues I wandered, yet again, down a rabbit hole leading to the topic of this post. The first avenue was through my main focus on a particular machine learning topic that utilized some concepts from physics, which naturally led me to stochastic calculus. The second avenue was through some projects at work in the quantitative finance space, which is one of the main applications of stochastic calculus. Naively, I thought I could write a brief post on it that would satisfy my curiosity -- that didn't work out at all! The result is this extra long post.

This post is about stochastic calculus, an extension of regular calculus to stochastic processes. It's not immediately obvious but the rigour needed to properly understand some of the key ideas requires going back to the measure theoretic definition of probability theory, so that's where I start in the background. From there I quickly move on to stochastic processes, the Wiener process, a particular flavour of stochastic calculus called Itô calculus, and finally end with a couple of applications. As usual, I try to include a mix of intuition, rigour where it helps intuition, and some simple examples. It's a deep and wide topic so I hope you enjoy my digest of it.

The Calculus of Variations

Brian Keng — Sun, 26 Feb 2017 15:08:38 GMT

This post is going to describe a specialized type of calculus called variational calculus. Analogous to the usual methods of calculus that we learn in university, this one deals with functions of functions and how to minimize or maximize them. It's used extensively in physics problems such as finding the minimum energy path a particle takes under certain conditions. As you can also imagine, it's also used in machine learning/statistics where you want to find a density that optimizes an objective 1. The explanation I'm going to use (at least for the first part) is heavily based upon Svetitsky's Notes on Functionals, which so far is the most intuitive explanation I've read. I'll try to follow Svetitsky's notes to give some intuition on how we arrive at variational calculus from regular calculus with a bunch of examples along the way. Eventually we'll get to an application that relates back to probability. I think with the right intuition and explanation, it's actually not too difficult, enjoy!

Maximum Entropy Distributions

Brian Keng — Fri, 27 Jan 2017 14:05:00 GMT

This post will talk about a method to find the probability distribution that best fits your given state of knowledge. Using the principle of maximum entropy and some testable information (e.g. the mean), you can find the distribution that makes the fewest assumptions about your data (the one with maximal information entropy). As you may have guessed, this is used often in Bayesian inference to determine prior distributions and also (at least implicitly) in natural language processing applications with maximum entropy (MaxEnt) classifiers (i.e. a multinomial logistic regression). As usual, I'll go through some intuition, some math, and some examples. Hope you find this topic as interesting as I do!

A Probabilistic Interpretation of Regularization

Brian Keng — Mon, 29 Aug 2016 12:52:33 GMT

This post is going to look at a probabilistic (Bayesian) interpretation of regularization. We'll take a look at both L1 and L2 regularization in the context of ordinary linear regression. The discussion will start off with a quick introduction to regularization, followed by a back-to-basics explanation starting with the maximum likelihood estimate (MLE), then on to the maximum a posteriori estimate (MAP), and finally playing around with priors to end up with L1 and L2 regularization.

A Probabilistic View of Linear Regression

Brian Keng — Sun, 15 May 2016 00:43:05 GMT

One thing that I always disliked about introductory material to linear regression is how randomness is explained. The explanations always seemed unintuitive because, as I have frequently seen it, they appear as an after thought rather than the central focus of the model. In this post, I'm going to try to take another approach to building an ordinary linear regression model starting from a probabilistic point of view (which is pretty much just a Bayesian view). After the general idea is established, I'll modify the model a bit and end up with a Poisson regression using the exact same principles showing how generalized linear models aren't any more complicated. Hopefully, this will help explain the "randomness" in linear regression in a more intuitive way.

Normal Approximation to the Posterior Distribution

Brian Keng — Sat, 02 Apr 2016 19:22:54 GMT

In this post, I'm going to write about how the ever versatile normal distribution can be used to approximate a Bayesian posterior distribution. Unlike some other normal approximations, this is not a direct application of the central limit theorem. The result has a straight forward proof using Laplace's Method whose main ideas I will attempt to present. I'll also simulate a simple scenario to see how it works in practice.

Elementary Statistics for Direct Marketing

Brian Keng — Sun, 28 Feb 2016 01:40:41 GMT

This post is going to look at some elementary statistics for direct marketing. Most of the techniques are direct applications of topics learned in a first year statistics course hence the "elementary". I'll start off by covering some background and terminology on the direct marketing and then introduce some of the statistical inference techniques that are commonly used. As usual, I'll mix in some theory where appropriate to build some intuition.

Markov Chain Monte Carlo Methods, Rejection Sampling and the Metropolis-Hastings Algorithm

Brian Keng — Sun, 13 Dec 2015 20:05:56 GMT

In this post, I'm going to continue on the same theme from the last post: random sampling. We're going to look at two methods for sampling a distribution: rejection sampling and Markov Chain Monte Carlo Methods (MCMC) using the Metropolis Hastings algorithm. As usual, I'll be providing a mix of intuitive explanations, theory and some examples with code. Hopefully, this will help explain a relatively straight-forward topic that is frequently presented in a complex way.

Sampling from a Normal Distribution

Brian Keng — Sun, 29 Nov 2015 02:57:02 GMT

One of the most common probability distributions is the normal (or Gaussian) distribution. Many natural phenomena can be modeled using a normal distribution. It's also of great importance due to its relation to the Central Limit Theorem.

In this post, we'll be reviewing the normal distribution and looking at how to draw samples from it using two methods. The first method using the central limit theorem, and the second method using the Box-Muller transform. As usual, some brief coverage of the mathematics and code will be included to help drive intuition.

Optimal Betting Strategies and The Kelly Criterion

Brian Keng — Sun, 15 Nov 2015 21:13:31 GMT

My last post was about some common mistakes when betting or gambling, even with a basic understanding of probability. This post is going to talk about the other side: optimal betting strategies using some very interesting results from some very famous mathematicians in the 50s and 60s. I'll spend a bit of time introducing some new concepts (at least to me), setting up the problem and digging into some of the math. We'll be looking at it from the lens of our simplest probability problem: the coin flip. A note: I will not be covering the part that shows you how to make a fortune -- that's an exercise best left to the reader.