Restricted Boltzmann Machine
TOC
Introduction
Boltzmann machine is an energy based model. It comes from statistical physics. The machine (network) simulates a process to reach thermal equilibrium (heating and controlled cooling to alter material physical properties) in metallurgy called annealing, hence the name simulated annealing. It is a probabilistic technique to approximate the global optimum of a function. Boltzmann machine is named after the famous Austrian scientist Boltzmann and is further developed and popularized by Hinton and Le Cunn.
Boltzmann machine
Boltzmann machine is an unsupervised neural network in which each node is connected to every other nodes. It is not a usual neural net. A usual neural net can have fully connected layers, but not intra-layer connection. And the connection in a usual neural net is forward, in a Boltzmann machine, the connection is bidirectional. In a Boltzmann machine, the nodes are classified as visible nodes and hidden nodes. The visible nodes are the ones that can be measured and hidden nodes are the ones that will not be measured. Those two types of nodes are considered in one single system.
Apart from that, a Boltzmann machine is also not deterministic, it is stochastic or generative. It is like a system that learns from data to become a representation of it. With that, it can tell the abnormalities in the system or be a PCA or dimensionality reduction technique.
The network of units would be measured with total energy for the overall network. The units are binary and the weights are stochastic. Here is the energy of the network:
with
The network runs by repeatedly choosing a unit and resetting its state. Running long enough at a temperature, the probability of a global state of the network depends only on that state’s energy, according to a Boltzmann distribution, not the inital state. We say the machine is at thermal equilibrium, the probability distribution of the global states converges.
Technically, each unit of the Boltzmann machine is a binary unit (it can be in state 0 or state 1). Each unit (state) is in the state space
Restricted Boltzmann machine (RBM)
Restricted Boltzmann machine is the Boltzmann machine with the restriction that there is no intra-layer connection. So visible layer would be input layer, and hidden nodes would be hidden layers. In image processing, each input unit is for a pixel of the image. There is no output layer.
Each visible node takes a low level feature from an item in the dataset to learn. The result is fed into an activation function which then produce the node’s output. This is also called the strength of the signal passing through it. The restricted Boltzmann machine is like a standard feed forward fully connected and deterministic neural network.
A RBM can also be considered probabilistic graphical models, or stochastic neural network. In other words, a restricted Boltzmann machine is an undirected bipartite graphical model with connections between visible nodes and hidden nodes. It corresponds to the joint probability distribution:
Then we can perform stochastic gradient descent on data log likelihood, with stopping criterion.
Deep belief network (DBN)
When we stack restricted Boltzmann machines together, we have the deep belief network. The DBN extract features from the input by maximalizing the likelihood of the dataset. A deep belief network is a model of the form:
with
Each
We can train a deep belief network with greedy layer wise training, proposed by Hinton.
Greedy layer-wise training
Learning is difficult for densely connected, directed belief nets with many hidden layers since it is difficult to infer the conditional distribution of the hidden activities given a data vector. But we can train layers sequentially starting from the input layer. Since
Contrastive divergence is a training technique that approximate the gradient of the log likelihood based on a short Markov chain (a way to sample from probabilistic odel) starting from the last example seen. It is an approximate Maximum Likelihood learning algorithm.