Monte Carlo and the Metropolis Algorithm#
For general \(H\), it is not easy to compute
exactly. Except in special cases (1D, non-interacting, exactly solvable), either approximate (variational/perturbative) or numerical approaches needed.
Monte Carlo is a general method for computing \(\sum_{\mu} f_{\mu}\) where \(\mu \in\) really big space. Applicable not just to statistical physics but also in biology, finance, complex systems etc. (cite)
In a general setting, \(\sum_\mu f_{\mu}\) may not have an underlying probabilistic interpretation; for example we might be trying to evaluate an oscillatory integral in a high-dim space \(\mu \in \mathbb{R}^{n}\)
Importance sampling#
In this general setting, the first step is to introduce an “importance” weight \(0<p_{\mu}<1, \sum_{N} p_{\mu}=1\). A natural choice is \(p_{\mu}=\frac{\left|f_{\mu}\right|}{\sum\left|f_{\mu}\right|}\), which places a higher weight at summands that are large. We can then trivially rewrite
In statistical mechanics, the natural choice is the Boltzmann distribution: \(p_{\mu}=\frac{e^{-\beta H_{\mu}}}{z}\)
Since the entropy \(S\propto N\) is extensive in the number \(N\) of degrees of freedom, which can be very large (e.g. \(N \sim 10^{23}\)), the number \(\Omega=e^{S/k_B}\) of all possible microstates is often huge. It is, therefore, way too slow to explicitly sum \(\sum_{\mu=1}^{\Omega} p_{\mu} \mathcal{O}_{\mu}\).
However, rare (“unimportant”) events \(p_{\mu} \ll 1\) don’t contribute much, so we shouldn’t spend much time on them. This suggest we can estimate \(\langle \mathcal{O}\rangle p\) by sampling from \(p_\mu\).
All this means is that we repeatedly roll an \(\Omega\)-sided die weighted by \(p_{\mu}\). This generates a sequence of samples \(\mu_{1}, \mu_{2}, \cdots \mu_{m}\) where \(M \ll \Omega\). Our estimate is then
and \(\lim _{M \rightarrow \infty} \frac{1}{M} \sum_{i=1}^{M} \mathcal{O}_{\mu_i}=\langle\mathcal{O}\rangle_p\) guaranteed.
For finite \(M\), there will be some error in our estimate of \(\langle \mathcal{O}\rangle_p\) because we may have made some atypical rolls of the die. The typical deviation between \(\langle \mathcal{O}\rangle_p\) and \(\frac{1}{M} \sum_{i=1}^{M} \mathcal{O}_{\mu_i}\) is (central limit theorem)
So the Monte-Carlo accuracy scales as
In most cases of interest, \(\text{var}{(\mathcal{O})}\) is bounded as the size of the space \(\Omega \rightarrow \infty\). For example, in the Ising model \(\Omega=2^{N}\), but \(\left\langle m^{2}\right\rangle \equiv N^{-1}\sum_i \sigma_i\) is independent of \(N\).
To obtain accuracy \(\epsilon\), we need \(M_{\epsilon}=\frac{\left\langle \mathcal{O}^{2}\right\rangle}{\epsilon^{2}}\) samples, which will thus be vastly less than a full sum \(\sum_{\mu=1}^{\Omega} p_{\mu} \mathcal{O}_{\mu}\) over all \(\Omega\) states.
However: we encounter a problem. For \(p_{\mu}=\frac{e^{-\beta H_{\mu}}}{Z}\), how do we efficiently roll a \(\Omega\)-sided die? Worse, we don’t actually know the probabilities because we cannot compute the normalization constant
aka the partition function. Getting around this problem is where the genius of Metropolis et al. ‘53 (cite) comes in. They discovered a way to sample from \(p_{\mu}=\frac{e^{-\beta E_{\mu}}}{Z}\) without knowing \(Z\) !
The idea is to invent some fictions dynamics \(\mu_{t} \rightarrow \mu_{t+1}\) so that \(\quad\left\{\mu_{1}, \mu_{2}, \mu_{3}, \ldots\right\}\) visits state \(\mu\) with frequency \(p_\mu\). Of course in stat-mech, we could choose \(r=\{q, p\}\) to evolve under \(\vec{F}=m \vec{a}\) with a reservoir, which by ergodic hypothesis would produce \(p_\mu\). But (as your HW demonstrates) this is expensive/slow. Instead we introduce some simpler/fake dynamics in the form of a “Markov chain”.
Markov chains#
A “Markov chain” is a set of transition rates \(0<M_{\mu^{\prime} \mu}<1\) under which the probability dist. evolves as
\(M_{\mu^{\prime} \mu}=P\left(\mu^{\prime} \mid \mu\right)\) is the conditional probability for the transition \(\mu \longrightarrow \mu^{\prime}\) at each time step. Probability conservation implies that the columns sum up to one,
(I.e. with probability 1, the process has to jump to one of the states.)
Note that the jump probabilities only depend on the current state \(\mu_t\) and not on the earlier states \(\{\mu_0, \dots \mu_{t-1}\}\) visited. The process is therefore said to by memory less, or “Markovian”.
At long times, the distribution equilibrates to
where \((M \pi)_{\mu}=\pi_{\mu}\), i.e., the steady state distribution \(\pi\) is the right eigen vector of \(M\) corresponding to eigen value \(1\).
We are not going to keep track of the full distribution \(p_{\mu}(t)\) (\(\Omega\) is huge). Instead, we want to sample from it, \(\left\{\mu_{1}, \mu_{2}, \cdots\right\}\).
To implement the Markov process represented by the above master equation (17), we pick at each time step a new state \(\mu_{t+1}\) given \(\mu_t\) by rolling a die biased on the conditional probability \(P\left(\mu_{t+1} \mid \mu_{t}\right)=M_{\mu_{t+1}\mu_t} \). As you repeat, you build up \(\left\{\mu_{1}, \mu_{2}, \cdots, \mu_{T}\right\}\) which is distributed at \(\pi_{\mu}\) as \(T \rightarrow \infty\).
So the question is now how to pick \(M_{\mu^\prime\mu}\) so that \(\pi_{\mu}=\frac{1}{Z} e^{-\beta H_{\mu}}\).
Suppose \(M\) satisfies
for any two \(\mu, \mu^{\prime}\) (“detailed balance”).
We have
So, any arbitrary distribution \(\pi_\mu\) can be generated by a Markov process where the transition rates satisfy the detailed balance condition corresponding to \(\pi_\mu\),
The key simplification is that \(Z\) cancels in these transition rate ratios - the RHS is easily computable.
There are many choices which satisfy detailed balance; the simplest is the Metropolis rule.
Side remark: Markov processes are often used to approximate the actual dynamics of a physical system, for example, in the case of the Brownian motion of a particle or polymer. In these cases, detailed balance needs to be satisfied because of the time reversibility of the microscopic laws of physics. From that it follows that the forward/backward dynamics have to look the same in equilibrium.
Metropolis Rule#
Let’s focus on a spin system \(\mu=\{\sigma_{1},\sigma_{2}, \cdots \sigma_{N}\}\) for notational simplicity (the original paper focusses on \(\mu=\left\{\vec{q}_{i}, \vec{p}_{i}\right\})\)
(1) Randomly pick one of the \(N\) spins ” \(\sigma_{i}\) “
(2) Consider a “trial” configuration \(\{\sigma'\}\) obtained by taking current \(\left\{\sigma\}_{t}\right.\) and flipping only \(\sigma_{i} \rightarrow-\sigma_{i}\)
E.g., for \(\{\sigma\}_{t}=\uparrow \uparrow \downarrow \uparrow\) and \(\{\sigma'\}=\uparrow \downarrow \downarrow \uparrow\).
(3) Compute the energy change
Note that because \(\mathcal{H}\) is local, this can be done only by looking at the spins near \(\sigma_{i}\) ; we don’t need to compute the full energy!
(4) Generate a random number \(0<q<1\).
If \( q<e^{-\beta \Delta E}\) : Accept the move. \(\left\{\sigma \}_{t+1}=\{\sigma\}^{\prime}\right.\)
Otherwise, if \(q>e^{-\beta \Delta E}\) : Reject the move: \(\{\sigma\}_{t+1}=\{\sigma\}_{t}\)
Note that if \(\Delta E<0, e^{-\beta \Delta E}>1\), we always accept the rule.
Let’s check: is \(\frac{M_{\mu^{\prime} \mu}}{M_{\mu \mu^{\prime}}}=e^{-\beta\left(E_{\mu^{\prime}}-E_{\mu}\right)}\)?
If \(\Delta E=E_{\mu}-E_{\mu}<0\), then
If \(\Delta E>0\)
Thus, \(M_{\mu'\mu}\) indeed satisfies detailed balance w.r.t. \(\pi_\mu\).
We throw this into a for-loop, generating thousands of samples
As we go along, we accumulate measurements
for any relevant observables.
Numerical Considerations#
(1) “Burn in”. Given an initial state, a MC chain relaxes to eq. value over time ” \(\tau\) “
\(\tau\) is determined by the eigenspectrum of \(M\). It is a good idea to throw-out the first \(\sim \tau\) measurements to account for this:
In practice you done know the relaxation time a priori, so \(T\) determined by various heuristics.
(2) Error Bars + auto-correlation
For a finite number of samples ” \(M\) “,
The error \(\epsilon\) is given by
if each a sample \(\mu_{t}\) is generated by an indepedent roll of an \(N\)-sided die. However, if \(\mu_{t}\) is generated by a Markov chain this generates a correlation between samples
where \(\tau_2\) is “auto-correlation” rime. Very roughly speaking this gives