<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Bounded Rationality (Posts about probability)</title><link>http://bjlkeng.github.io/</link><description></description><atom:link href="http://bjlkeng.github.io/categories/probability.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><lastBuildDate>Tue, 10 Mar 2026 20:54:59 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>An Introduction to Stochastic Calculus</title><link>http://bjlkeng.github.io/posts/an-introduction-to-stochastic-calculus/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div&gt;&lt;p&gt;Through a couple of different avenues I wandered, yet again, down a rabbit hole
leading to the topic of this post.  The first avenue was through my main focus
on a particular machine learning topic that utilized some concepts from
physics, which naturally led me to stochastic calculus.  The second avenue was
through some projects at work in the quantitative finance space, which is one
of the main applications of stochastic calculus.  Naively, I thought I could
write a brief post on it that would satisfy my curiosity -- that didn't work
out at all! The result is this extra long post.&lt;/p&gt;
&lt;p&gt;This post is about stochastic calculus, an extension of regular calculus to
stochastic processes.  It's not immediately obvious
but the rigour needed to properly understand some of the key ideas requires
going back to the measure theoretic definition of probability theory, so
that's where I start in the background. From there I quickly move on to
stochastic processes, the Wiener process, a particular flavour of stochastic
calculus called Itô calculus, and finally end with a couple of applications.
As usual, I try to include a mix of intuition, rigour where it helps intuition,
and some simple examples.  It's a deep and wide topic so I hope you enjoy my
digest of it.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/an-introduction-to-stochastic-calculus/"&gt;Read more…&lt;/a&gt; (72 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>Black-Scholes-Merton</category><category>Brownian motion</category><category>Langevin</category><category>mathjax</category><category>measure theory</category><category>probability</category><category>sigma algebra</category><category>stochastic calculus</category><category>Weiner process</category><category>white noise</category><guid>http://bjlkeng.github.io/posts/an-introduction-to-stochastic-calculus/</guid><pubDate>Mon, 12 Sep 2022 01:05:55 GMT</pubDate></item><item><title>The Calculus of Variations</title><link>http://bjlkeng.github.io/posts/the-calculus-of-variations/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div&gt;&lt;p&gt;This post is going to describe a specialized type of calculus called
variational calculus.
Analogous to the usual methods of calculus that we learn in university,
this one deals with functions &lt;em&gt;of functions&lt;/em&gt; and how to
minimize or maximize them.  It's used extensively in physics problems such as
finding the minimum energy path a particle takes under certain conditions.  As
you can also imagine, it's also used in machine learning/statistics where you
want to find a density that optimizes an objective &lt;a class="footnote-reference brackets" href="http://bjlkeng.github.io/posts/the-calculus-of-variations/#id4" id="id1"&gt;1&lt;/a&gt;.  The explanation I'm
going to use (at least for the first part) is heavily based upon Svetitsky's
&lt;a class="reference external" href="http://julian.tau.ac.il/bqs/functionals/functionals.html"&gt;Notes on Functionals&lt;/a&gt;, which so far is
the most intuitive explanation I've read.  I'll try to follow Svetitsky's
notes to give some intuition on how we arrive at variational calculus from
regular calculus with a bunch of examples along the way.  Eventually we'll
get to an application that relates back to probability.  I think with the right
intuition and explanation, it's actually not too difficult, enjoy!&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/the-calculus-of-variations/"&gt;Read more…&lt;/a&gt; (16 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>differentials</category><category>entropy</category><category>lagrange multipliers</category><category>mathjax</category><category>probability</category><category>variational calculus</category><guid>http://bjlkeng.github.io/posts/the-calculus-of-variations/</guid><pubDate>Sun, 26 Feb 2017 15:08:38 GMT</pubDate></item><item><title>Maximum Entropy Distributions</title><link>http://bjlkeng.github.io/posts/maximum-entropy-distributions/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div&gt;&lt;p&gt;This post will talk about a method to find the probability distribution that best
fits your given state of knowledge.  Using the principle of maximum
entropy and some testable information (e.g. the mean), you can find the
distribution that makes the fewest assumptions about your data (the one with maximal
information entropy).  As you may have guessed, this is used often in Bayesian
inference to determine prior distributions and also (at least implicitly) in
natural language processing applications with maximum entropy (MaxEnt)
classifiers (i.e. a multinomial logistic regression).  As usual, I'll go through
some intuition, some math, and some examples.  Hope you find this topic as
interesting as I do!&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/maximum-entropy-distributions/"&gt;Read more…&lt;/a&gt; (11 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>entropy</category><category>mathjax</category><category>probability</category><guid>http://bjlkeng.github.io/posts/maximum-entropy-distributions/</guid><pubDate>Fri, 27 Jan 2017 14:05:00 GMT</pubDate></item><item><title>A Probabilistic Interpretation of Regularization</title><link>http://bjlkeng.github.io/posts/probabilistic-interpretation-of-regularization/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div&gt;&lt;p&gt;This post is going to look at a probabilistic (Bayesian) interpretation of
regularization.  We'll take a look at both L1 and L2 regularization in the
context of ordinary linear regression.  The discussion will start off
with a quick introduction to regularization, followed by a back-to-basics
explanation starting with the maximum likelihood estimate (MLE), then on to the
maximum a posteriori estimate (MAP), and finally playing around with priors to
end up with L1 and L2 regularization.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/probabilistic-interpretation-of-regularization/"&gt;Read more…&lt;/a&gt; (9 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>Bayesian</category><category>mathjax</category><category>probability</category><category>regularization</category><guid>http://bjlkeng.github.io/posts/probabilistic-interpretation-of-regularization/</guid><pubDate>Mon, 29 Aug 2016 12:52:33 GMT</pubDate></item><item><title>A Probabilistic View of Linear Regression</title><link>http://bjlkeng.github.io/posts/a-probabilistic-view-of-regression/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div&gt;&lt;p&gt;One thing that I always disliked about introductory material to linear
regression is how randomness is explained.  The explanations always
seemed unintuitive because, as I have frequently seen it, they appear as an
after thought rather than the central focus of the model.
In this post, I'm going to try to
take another approach to building an ordinary linear regression model starting
from a probabilistic point of view (which is pretty much just a Bayesian view).
After the general idea is established, I'll modify the model a bit and end up
with a Poisson regression using the exact same principles showing how
generalized linear models aren't any more complicated.  Hopefully, this will
help explain the "randomness" in linear regression in a more intuitive way.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/a-probabilistic-view-of-regression/"&gt;Read more…&lt;/a&gt; (12 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>Bayesian</category><category>logistic</category><category>mathjax</category><category>Poisson</category><category>probability</category><category>regression</category><guid>http://bjlkeng.github.io/posts/a-probabilistic-view-of-regression/</guid><pubDate>Sun, 15 May 2016 00:43:05 GMT</pubDate></item><item><title>Normal Approximation to the Posterior Distribution</title><link>http://bjlkeng.github.io/posts/normal-approximations-to-the-posterior-distribution/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div class="cell border-box-sizing text_cell rendered"&gt;&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;In this post, I'm going to write about how the ever versatile normal distribution can be used to approximate a Bayesian posterior distribution.  Unlike some other normal approximations, this is &lt;em&gt;not&lt;/em&gt; a direct application of the central limit theorem.  The result has a straight forward proof using Laplace's Method whose main ideas I will attempt to present.  I'll also simulate a simple scenario to see how it works in practice.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/normal-approximations-to-the-posterior-distribution/"&gt;Read more…&lt;/a&gt; (14 min remaining to read)&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;</description><category>Bayesian</category><category>normal distribution</category><category>posterior</category><category>prior</category><category>probability</category><category>sampling</category><guid>http://bjlkeng.github.io/posts/normal-approximations-to-the-posterior-distribution/</guid><pubDate>Sat, 02 Apr 2016 19:22:54 GMT</pubDate></item><item><title>Elementary Statistics for Direct Marketing</title><link>http://bjlkeng.github.io/posts/normal-difference-distribution/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div&gt;&lt;p&gt;This post is going to look at some elementary statistics for direct marketing.
Most of the techniques are direct applications of topics learned in a first
year statistics course hence the "elementary".  I'll start off by covering some
background and terminology on the direct marketing and then introduce some of
the statistical inference techniques that are commonly used.  As usual, I'll
mix in some theory where appropriate to build some intuition.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/normal-difference-distribution/"&gt;Read more…&lt;/a&gt; (20 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>direct marketing</category><category>mathjax</category><category>normal</category><category>probability</category><category>sample size</category><guid>http://bjlkeng.github.io/posts/normal-difference-distribution/</guid><pubDate>Sun, 28 Feb 2016 01:40:41 GMT</pubDate></item><item><title>Markov Chain Monte Carlo Methods, Rejection Sampling and the Metropolis-Hastings Algorithm</title><link>http://bjlkeng.github.io/posts/markov-chain-monte-carlo-mcmc-and-the-metropolis-hastings-algorithm/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div class="cell border-box-sizing text_cell rendered"&gt;&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;In this post, I'm going to continue on the same theme from the last post: &lt;a href="http://bjlkeng.github.io/posts/sampling-from-a-normal-distribution/"&gt;random sampling&lt;/a&gt;.  We're going to look at two methods for sampling a distribution: rejection sampling and Markov Chain Monte Carlo Methods (MCMC) using the Metropolis Hastings algorithm.  As usual, I'll be providing a mix of intuitive explanations, theory and some examples with code.  Hopefully, this will help explain a relatively straight-forward topic that is frequently presented in a complex way.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/markov-chain-monte-carlo-mcmc-and-the-metropolis-hastings-algorithm/"&gt;Read more…&lt;/a&gt; (20 min remaining to read)&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;</description><category>Markov Chain</category><category>MCMC</category><category>Metropolis-Hastings</category><category>Monte Carlo</category><category>probability</category><category>rejection sampling</category><category>sampling</category><guid>http://bjlkeng.github.io/posts/markov-chain-monte-carlo-mcmc-and-the-metropolis-hastings-algorithm/</guid><pubDate>Sun, 13 Dec 2015 20:05:56 GMT</pubDate></item><item><title>Sampling from a Normal Distribution</title><link>http://bjlkeng.github.io/posts/sampling-from-a-normal-distribution/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div class="cell border-box-sizing text_cell rendered"&gt;&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;One of the most common probability distributions is the normal (or Gaussian) distribution.  Many natural phenomena can be modeled using a normal distribution.  It's also of great importance due to its relation to the &lt;a href="https://en.wikipedia.org/wiki/Central_limit_theorem"&gt;Central Limit Theorem&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In this post, we'll be reviewing the normal distribution and looking at how to draw samples from it using two methods.  The first method using the central limit theorem, and the second method using the &lt;a href="https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform"&gt;Box-Muller transform&lt;/a&gt;.  As usual, some brief coverage of the mathematics and code will be included to help drive intuition.
&lt;/p&gt;&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/sampling-from-a-normal-distribution/"&gt;Read more…&lt;/a&gt; (13 min remaining to read)&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;</description><category>normal distribution</category><category>probability</category><category>sampling</category><guid>http://bjlkeng.github.io/posts/sampling-from-a-normal-distribution/</guid><pubDate>Sun, 29 Nov 2015 02:57:02 GMT</pubDate></item><item><title>Optimal Betting Strategies and The Kelly Criterion</title><link>http://bjlkeng.github.io/posts/optimal-betting-and-the-kelly-criterion/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div&gt;&lt;p&gt;My last post was about some &lt;a class="reference external" href="http://bjlkeng.github.io/posts/gamblers-fallacy-and-the-law-of-small-numbers/"&gt;common mistakes&lt;/a&gt; when betting
or gambling, even with a basic understanding of probability.  This post is going to
talk about the other side: optimal betting strategies using some very
interesting results from some very famous mathematicians in the 50s and 60s.
I'll spend a bit of time introducing some new concepts (at least to me), setting up the
problem and digging into some of the math.  We'll be looking at it from the
lens of our simplest probability problem: the coin flip.  A note: I will not be
covering the part that shows you how to make a fortune -- that's an exercise
best left to the reader.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/optimal-betting-and-the-kelly-criterion/"&gt;Read more…&lt;/a&gt; (12 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>betting</category><category>Kelly Criterion</category><category>mathjax</category><category>probability</category><category>Shannon</category><category>Thorp</category><guid>http://bjlkeng.github.io/posts/optimal-betting-and-the-kelly-criterion/</guid><pubDate>Sun, 15 Nov 2015 21:13:31 GMT</pubDate></item></channel></rss>