The Logic Behind the Maximum Entropy Principle
For a while now, I've really enjoyed diving deep to understand probability and related fundamentals (see here, here, and here). Entropy is a topic that comes up all over the place from physics to information theory, and of course, machine learning. I written about it in various different forms but always taken it as a given as the "expected information". Well I found a few of good explanations about how to "derive" it and thought that I should share.
In this post, I'll be showing a few of derivations of the maximum entropy principle, where entropy appears as part of the definition. These derivations will show why it is a reasonable and natural thing to maximize, and how it is determined from some well thought out reasoning. This post will be more math heavy but hopefully it will give you more insight into this wonderfully surprising topic.