<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Bounded Rationality (Posts about residual networks)</title><link>http://bjlkeng.github.io/</link><description></description><atom:link href="http://bjlkeng.github.io/categories/residual-networks.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><lastBuildDate>Tue, 10 Mar 2026 20:54:59 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Label Refinery: A Softer Approach</title><link>http://bjlkeng.github.io/posts/label-refinery/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div&gt;&lt;p&gt;This post is going to be about a really simple idea that is surprisingly effective
from a paper by Bagherinezhad et al. called &lt;a class="reference external" href="https://arxiv.org/abs/1805.02641"&gt;Label Refinery: Improving ImageNet
Classification through Label Progression&lt;/a&gt;.
The title pretty much says it all but I'll also discuss some intuition and show
some experiments on the CIFAR10 and SVHN datasets.  The idea is both simple and
surprising, my favourite kind of idea!  Let's take a look.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/label-refinery/"&gt;Read more…&lt;/a&gt; (10 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>CIFAR10</category><category>label refinery</category><category>mathjax</category><category>residual networks</category><category>svhn</category><guid>http://bjlkeng.github.io/posts/label-refinery/</guid><pubDate>Tue, 04 Sep 2018 11:26:02 GMT</pubDate></item><item><title>Universal ResNet: The One-Neuron Approximator</title><link>http://bjlkeng.github.io/posts/universal-resnet-the-one-neuron-approximator/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div&gt;&lt;p&gt;&lt;em&gt;"In theory, theory and practice are the same. In practice, they are not."&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I read a very interesting paper titled &lt;em&gt;ResNet with one-neuron hidden layers is
a Universal Approximator&lt;/em&gt; by Lin and Jegelka [1].
The paper describes a simplified Residual Network as a universal approximator,
giving some theoretical backing to the wildly successful ResNet architecture.
In this post, I'm going to talk about this paper and a few of the related
universal approximation theorems for neural networks.
Instead of going through all the theoretical stuff, I'm simply going introduce
some theorems and play around with some toy datasets to see if we can get close
to the theoretical limits.&lt;/p&gt;
&lt;p&gt;(You might also want to checkout my previous post where I played around with
ResNets: &lt;a class="reference external" href="http://bjlkeng.github.io/posts/residual-networks/"&gt;Residual Networks&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/universal-resnet-the-one-neuron-approximator/"&gt;Read more…&lt;/a&gt; (11 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>hidden layers</category><category>mathjax</category><category>neural networks</category><category>residual networks</category><category>ResNet</category><category>universal approximator</category><guid>http://bjlkeng.github.io/posts/universal-resnet-the-one-neuron-approximator/</guid><pubDate>Fri, 03 Aug 2018 12:03:28 GMT</pubDate></item><item><title>Residual Networks</title><link>http://bjlkeng.github.io/posts/residual-networks/</link><dc:creator>Brian Keng</dc:creator><description>&lt;div&gt;&lt;p&gt;Taking a small break from some of the heavier math, I thought I'd write a post
(aka learn more about) a very popular neural network architecture called
Residual Networks aka ResNet.  This architecture is being very widely used
because it's so simple yet so powerful at the same time.  The architecture's
performance is due its ability to add hundreds of layers (talk about deep
learning!) without degrading performance or adding difficulty to training.  I
really like these types of robust advances where it doesn't require fiddling
with all sorts of hyper-parameters to make it work.  Anyways, I'll introduce
the idea and show an implementation of ResNet on a few runs of a variational
autoencoder that I put together on the CIFAR10 dataset.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://bjlkeng.github.io/posts/residual-networks/"&gt;Read more…&lt;/a&gt; (9 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>autoencoders</category><category>CIFAR10</category><category>mathjax</category><category>residual networks</category><category>ResNet</category><guid>http://bjlkeng.github.io/posts/residual-networks/</guid><pubDate>Sun, 18 Feb 2018 18:55:13 GMT</pubDate></item></channel></rss>