The AT Adaptive Normal proposal
In this notebook we test out the ATAdaptiveNormal
proposal, which
uses the adaptation algorithm from Andrieu & Thomas
(2008).
%matplotlib notebook
from matplotlib import pyplot
import numpy
import epsie
from epsie import make_betas_ladder
from epsie.samplers import MetropolisHastingsSampler
from epsie.proposals import ATAdaptiveNormal
import multiprocessing
Create the model to sample
from scipy import stats
class Model(object):
def __init__(self):
# we'll use a 2D Gaussian for the likelihood distribution
self.params = ['x', 'y']
self.mean = [2., 5.]
self.cov = [[1., 0.9], [0.9, 1.]]
self.likelihood_dist = stats.multivariate_normal(mean=self.mean,
cov=self.cov)
# we'll just use a uniform prior
self.prior_bounds = {'x': (-20., 20.),
'y': (-40., 40.)}
xmin = self.prior_bounds['x'][0]
dx = self.prior_bounds['x'][1] - xmin
ymin = self.prior_bounds['y'][0]
dy = self.prior_bounds['y'][1] - ymin
self.prior_dist = {'x': stats.uniform(xmin, dx),
'y': stats.uniform(ymin, dy)}
def prior_rvs(self, size=None, shape=None):
return {p: self.prior_dist[p].rvs(size=size).reshape(shape)
for p in self.params}
def logprior(self, **kwargs):
return sum([self.prior_dist[p].logpdf(kwargs[p]) for p in self.params])
def loglikelihood(self, **kwargs):
return self.likelihood_dist.logpdf([kwargs[p] for p in self.params])
def __call__(self, **kwargs):
logp = self.logprior(**kwargs)
if logp == -numpy.inf:
logl = None
else:
logl = self.loglikelihood(**kwargs)
return logl, logp
Setup and run the sampler
Create a pool of 4 parallel processes, then initialize the sampler using the model we created above, and use 4 chains. For now assume adaptation duration is 2048 steps.
model = Model()
nchains = 4
nprocs = 4
pool = multiprocessing.Pool(nprocs)
prior_width = {'x' : 40, 'y': 80}
proposal = ATAdaptiveNormal(model.params, adaptation_duration=2048)
sampler = MetropolisHastingsSampler(model.params, model, nchains, pool=pool, proposals=[proposal])
Now set the starting positions of the chains by drawing random variates from the model’s prior.
sampler.start_position = model.prior_rvs(size=nchains)
Let’s run it!
This will evolve each chain in the collection by 4096 steps. This is parallelized over the pool of processes.
sampler.run(4096)
Extract results
We can get the history of all of the chains using the .positions
attribute. This will return a numpy structured array in which the fields
are the parameters names (in this case, 'x'
and 'y'
), and with
shape nchains x niterations
:
positions = sampler.positions
print('sampler.positions: {}'.format(type(positions)))
print('with fields: {}'.format(positions.dtype.names))
print('and shape:', positions.shape)
sampler.positions: <class 'numpy.ndarray'>
with fields: ('x', 'y')
and shape: (4, 4096)
This (or any structured array returned by epsie) can be turned into a
dictionary of arrays, where the keys are the parameter names, using
epsie.array2dict
:
positions = epsie.array2dict(sampler.positions)
print('sampler.positions: {} with keys/values:'.format(type(positions)))
for param in sorted(positions):
print('"{}": {} with shape {}'.format(param, type(positions[param]), positions[param].shape))
sampler.positions: <class 'dict'> with keys/values:
"x": <class 'numpy.ndarray'> with shape (4, 4096)
"y": <class 'numpy.ndarray'> with shape (4, 4096)
We can also access the history of log likelihoods and log priors using
sampler.stats
, as well as the acceptance ratios and which jumps were
accepted with sampler.acceptance
:
stats = sampler.stats
print('sampler.stats: {}'.format(type(stats)))
print('with fields: {}'.format(stats.dtype.names))
print('and shape:', stats.shape)
sampler.stats: <class 'numpy.ndarray'>
with fields: ('logl', 'logp')
and shape: (4, 4096)
acceptance = sampler.acceptance
print('sampler.acceptance: {}'.format(type(acceptance)))
print('with fields: {}'.format(acceptance.dtype.names))
print('and shape:', acceptance.shape)
sampler.acceptance: <class 'numpy.ndarray'>
with fields: ('acceptance_ratio', 'accepted')
and shape: (4, 4096)
If the model returned “blobs” (i.e., the model returns a dictionary
along with the logl and logp), then we can also access those using
sampler.blobs
. Similar to positions
, this would also be a
dictionary of arrays with keys given by the names in the dictionary the
model returned. However, because our model above returns no blobs, in
this case we just get None
:
print(sampler.blobs)
None
The individual chains can be accessed using the .chains
attribute:
sampler.chains
[<epsie.chain.chain.Chain at 0x12ceb2bd0>,
<epsie.chain.chain.Chain at 0x12ceb26d0>,
<epsie.chain.chain.Chain at 0x111dea990>,
<epsie.chain.chain.Chain at 0x111dea190>]
Print the chain covariance matrices that are used when proposing new jumps. Note that the individual elements are much larger than the distribution covariance matrix. That is because the proposal covariance matrix is being rescaled to achieve the \(23\%\) acceptance ratio. In this example this corresponds to proposing bolder jumps.
for i, chain in enumerate(sampler.chains):
print('Chain {}'.format(i))
print(chain.proposal_dist.proposals[0].cov)
Chain 0
[[7.69045825 7.0172752 ]
[7.0172752 7.44016005]]
Chain 1
[[6.25782701 5.78408719]
[5.78408719 6.71258485]]
Chain 2
[[5.5139045 5.19395551]
[5.19395551 5.88254028]]
Chain 3
[[5.89521416 5.40735526]
[5.40735526 6.16999921]]
Plot the rolling mean acceptance ratio
def moving_average(a, n=100) :
ret = numpy.cumsum(a, dtype=float)
ret[n:] = ret[n:] - ret[:-n]
return ret[n - 1:] / n
sampler.acceptance['accepted'].shape
(4, 4096)
pyplot.figure(dpi=120)
for i in range(nchains):
arr = sampler.acceptance['acceptance_ratio'][i, :]
arr[arr > 1] = 1.
pyplot.plot(moving_average(arr, n=100))
pyplot.show()
<IPython.core.display.Javascript object>
Plot the posterior
Let’s create a scatter plot of the posterior. For this, we’ll need to throw out some earlier samples for the burn-in period; we’ll just assume the first-half of the chains were burn in. We also need to calculate the autocorrelation length of the chains in order to get independent samples.
def calculate_acf(data):
"""Calculates the autocorrelation of some data"""
# zero the mean
data = data - data.mean()
# zero-pad to 2 * nearest power of 2
newlen = int(2**(1+numpy.ceil(numpy.log2(len(data)))))
x = numpy.zeros(newlen)
x[:len(data)] = data[:]
# correlate
acf = numpy.correlate(x, x, mode='full')
# drop corrupted region
acf = acf[len(acf)//2:]
# normalize
acf /= acf[0]
return acf
def calculate_acl(data):
"""Calculates the autocorrelation length of some data.
Algorithm used is from:
N. Madras and A.D. Sokal, J. Stat. Phys. 50, 109 (1988).
"""
# calculate the acf
acf = calculate_acf(data)
# now the ACL: Following from Sokal, this is estimated
# as the first point where M*tau[k] <= k, where
# tau = 2*cumsum(acf) - 1, and M is a tuneable parameter,
# generally chosen to be = 5 (which we use here)
m = 5
cacf = 2.*numpy.cumsum(acf) - 1.
win = m * cacf <= numpy.arange(len(cacf))
if win.any():
acl = int(numpy.ceil(cacf[numpy.where(win)[0][0]]))
else:
# data is too short to estimate the ACL, just choose
# the length of the data
acl = len(data)
return acl
Since the chains are completely independent of each other, we can calculate the ACL separately for each chain. However, if you’d like to be more conservative, you can also just take the max over all of the chains.
# get the samples; recall that this is a dictionary of
# nchains x niterations arrays for each parameter
samples = sampler.positions
stats = sampler.stats['logl'] + sampler.stats['logp']
# as we said above, we'll assume the first half
# of the chain was burn in
burnin_iter = sampler.niterations // 2
# set up arrays to store the ACL of each chain and
# the thinned chains
acls = numpy.zeros(nchains, dtype=int)
thinned_arrays = {'x': [], 'y': []}
thinned_stats = []
# cycle over the chains, calculating the ACLs and thinning them
for ii in range(nchains):
# get the second half of the chains
sx = samples['x'][ii, burnin_iter:]
sy = samples['y'][ii, burnin_iter:]
sz = stats[ii, burnin_iter:]
# compute the acl for each parameter
aclx = calculate_acl(sx)
acly = calculate_acl(sy)
acl = max(aclx, acly)
acls[ii] = acl
# note that we'll thin the arrays starting from the
# end to get the lastest results
thinned_arrays['x'].append(sx[::-1][::acl][::-1])
thinned_arrays['y'].append(sy[::-1][::acl][::-1])
thinned_stats.extend(sz[::-1][::acl][::-1])
# the ACL of each chain:
print(acls)
[13 9 10 11]
# create a flattened posterior array
posterior = {'x': numpy.concatenate(thinned_arrays['x']),
'y': numpy.concatenate(thinned_arrays['y'])}
print("Number of independent samples:", posterior['x'].size)
Number of independent samples: 778
# histogram them
fig, ax = pyplot.subplots()
ax.hist(posterior['x'], bins='auto', histtype='step', label='x')
ax.hist(posterior['y'], bins='auto', histtype='step', label='y')
ax.legend()
fig.show()
<IPython.core.display.Javascript object>
Make a plot of the posterior
# fig, ax = pyplot.subplots()
pyplot.figure(dpi=120)
pyplot.scatter(posterior['x'], posterior['y'], c=thinned_stats, s=5)
pyplot.colorbar()
pyplot.show()
<IPython.core.display.Javascript object>
Let’s check the mean and variance of our estimated posterior. These should be \(\bar{x} \approx 2, \sigma^2_{x} \approx 1\) and \(\bar{y} \approx 5, \sigma^2_{y} \approx 1, \sigma^2_{xy} \approx 0.9\):
arr = numpy.array([posterior[p] for p in posterior.keys()])
print('x mean: {:.4f}, y mean: {:.4f}'.format(arr.mean(axis=-1)[0], arr.mean(axis=-1)[1]))
print('estimated covariance matrix:')
print(numpy.cov(arr))
x mean: 2.0111, y mean: 5.0123
estimated covariance matrix:
[[0.97356281 0.86118957]
[0.86118957 0.94511098]]