This presentation does not reflect any workings or material from Autodesk. Also, I am not a core-contributor to the PyMC3 project, but a user.
Also, as we continue & recover from the great Panini, please consider helping or contributing to a social causes, nonprofits, or sponsoring Women & BIPOC applicants at your company. #PayItForward
But before going into PyMC3, we will briefly cover Bayesian methodology & Bayesian Modeling.
Particularly:
Bayesian Statistics is a method of statistical inference in which Bayes’ theorem is used to update the probability for a hypothesis as more evidence or information becomes available. The resulting calculations are a distribution for some event.
Leveraging Bayesian Statistics, we can leverage probabilistics programming to create Bayesian (probabilistic) models to solve problems requiring estimated probabilities from resulting probability distributions.
A few core principles of Probabilities:
Bayes' Theorem
$$p(\theta | y) = \tfrac{p(y|\theta)p(\theta)}{p(y)}$$where
Let's say we have a coin-toss scenario with no understanding of what expected probability.
We want to understand what is the probability that the coin-toss being heads will typically be.
There are several scenarios with a total number of trials $N$.
N= [0,1,2,3,4,8,16,32,50,150]
[0, 1, 2, 3, 4, 8, 16, 32, 50, 150]
Respectively for these $N$ trials, we have a prior probability $p(\theta)$ = 0.35, 35% chance of being heads.
And the likelihood $p(y|\theta)$ of that scenario as well.
Not yet leveraging PyMC3, we use scipy & matplotlib to respectively calculate & visualize our distributions of what the probability of coin-toss being heads would typically be, from an outputted distribution
Data: [0, 1, 1, 1, 1, 4, 6, 9, 13, 48] 𝜃_real: 0.35
Text(0.5, 0.98, 'Coin Toss Scenarios for N Trials')
For the first subplot of 0 trials:
Already, this was enough to sufficiently get a gloss of probabilities. To learn more about Bayesian Statisitcs, we could continue.
Or, as textbook authors say,
Just kidding, we would like to proceed with an actual implemenation of the PyMC3 library.
Bayes' Theorem
$$p(\theta | y) = \tfrac{p(y|\theta)p(\theta)}{p(y)}$$where
Similar to the growth of Machine Learning & Artificial Intelligence, Probabilistic Programming, implementation of Bayesian Modeling, had a combination of innovation & limitations due to the following:
However, as data technology has become more advanced in the recent years, we can now leverage the opportunity to implement Probabilistic Programming.
PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning. It's associated to one of several Probabilistic Programming Languages (PPL) which focuses on advanced Markov chain Monte Carlo (MCMC) and variational fitting algorithms, and more.
Some features of PyMC3 are the following:
Built on top of Theano
Note: As of 2021, Theano is now "Aesara". See "The Future of PyMC3, or: Theano is Dead, Long Live Theano " for historical background & next steps
Using the coin-toss example, let's say we have flip a coin 4 times.
We have some intuition that the probability from coin tosses are 0.35 and we have some idea of what the probability of the coin tosses would be.
We would like to see the probability of flipping the coin with our prior assumption and 4 attempts.
trials = 4
𝜃_real = 0.35
data = stats.bernoulli.rvs(p=𝜃_real, size =trials)
Trials: 4 Bias (𝜃): 0.35 Data: [1 0 0 0]
To go about this, we would be leveraging PyMC3 in the following procedure:
import pymc3 as pm
with pm.Model() as coin_flip_model:
𝜃 = pm.Beta('𝜃', alpha=1, beta=1)
y = pm.Bernoulli('y', p = 𝜃, observed=data)
We now perform inference to approximate the posterior distribution.
The options for this inference is either by:
with coin_flip_model:
trace = pm.sample(2_000, random_seed=123, return_inferencedata=True)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
/Users/CloudChaoszero/opt/anaconda3/envs/general/lib/python3.7/site-packages/aesara/graph/fg.py:501: UserWarning: Variable Elemwise{Switch}.0 cannot be replaced; it isn't in the FunctionGraph
f"Variable {var} cannot be replaced; it isn't in the FunctionGraph"
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [𝜃]
100.00% [6000/6000 00:04<00:00 Sampling 2 chains, 0 divergences]
Sampling 2 chains for 1_000 tune and 2_000 draw iterations (2_000 + 4_000 draws total) took 19 seconds.
or....
with pm.Model() as coin_flip_model:
𝜃 = pm.Beta('𝜃', alpha=1, beta=1) # Prior
y = pm.Bernoulli('y', p = 𝜃, observed=data) # Likelihood
trace = pm.sample(2_000, random_seed=123, return_inferencedata=True) #Draw Samples
We just did the following:
coin_flip_model
pm.sample
function to create a trace with 2,000 drawn samples from the posterior distributionNow, for PyMC3 versions 3.9+, we just saw the parameter return_inferencedata=True
to return an arviz.InferenceData object, instead of a default MultiTrace
object.
trace = pm.sample(draws = 20_000, return_inferencedata=True)
InferenceData
has many advantages, compared to a MultiTrace: For example it can be saved/loaded from a file, and can also carry additional (meta)data such as date/version, or posterior predictive distributions.
trace
<xarray.Dataset> Dimensions: (chain: 2, draw: 2000) Coordinates: * chain (chain) int64 0 1 * draw (draw) int64 0 1 2 3 4 5 6 7 ... 1993 1994 1995 1996 1997 1998 1999 Data variables: 𝜃 (chain, draw) float64 0.09391 0.1793 0.211 ... 0.2719 0.2625 0.1611 Attributes: created_at: 2021-04-22T07:56:16.445509 arviz_version: 0.11.2 inference_library: pymc3 inference_library_version: 3.11.1 sampling_time: 11.031096935272217 tuning_steps: 1000
array([0, 1])
array([ 0, 1, 2, ..., 1997, 1998, 1999])
array([[0.0939108 , 0.17932678, 0.21104018, ..., 0.45345366, 0.63655435, 0.43731458], [0.19751072, 0.29482871, 0.28109058, ..., 0.27187977, 0.26246493, 0.16109471]])
<xarray.Dataset> Dimensions: (chain: 2, draw: 2000, y_dim_0: 4) Coordinates: * chain (chain) int64 0 1 * draw (draw) int64 0 1 2 3 4 5 6 7 ... 1993 1994 1995 1996 1997 1998 1999 * y_dim_0 (y_dim_0) int64 0 1 2 3 Data variables: y (chain, draw, y_dim_0) float64 -2.365 -0.09862 ... -0.1757 -0.1757 Attributes: created_at: 2021-04-22T07:56:16.683197 arviz_version: 0.11.2 inference_library: pymc3 inference_library_version: 3.11.1
array([0, 1])
array([ 0, 1, 2, ..., 1997, 1998, 1999])
array([0, 1, 2, 3])
array([[[-2.36540986, -0.09861753, -0.09861753, -0.09861753], [-1.71854556, -0.19763027, -0.19763027, -0.19763027], [-1.55570676, -0.23703988, -0.23703988, -0.23703988], ..., [-0.7908622 , -0.60413618, -0.60413618, -0.60413618], [-0.45168548, -1.01212551, -1.01212551, -1.01212551], [-0.82710249, -0.57503456, -0.57503456, -0.57503456]], [[-1.62196244, -0.22003678, -0.22003678, -0.22003678], [-1.22136074, -0.34931454, -0.34931454, -0.34931454], [-1.26907832, -0.33001991, -0.33001991, -0.33001991], ..., [-1.30239534, -0.31728909, -0.31728909, -0.31728909], [-1.33763779, -0.30444165, -0.30444165, -0.30444165], [-1.82576283, -0.17565746, -0.17565746, -0.17565746]]])
<xarray.Dataset> Dimensions: (chain: 2, draw: 2000) Coordinates: * chain (chain) int64 0 1 * draw (draw) int64 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999 Data variables: (12/13) energy (chain, draw) float64 5.204 4.924 5.009 ... 3.915 4.358 perf_counter_diff (chain, draw) float64 0.0006529 0.0002448 ... 0.0002258 tree_depth (chain, draw) int64 2 1 2 2 2 1 2 2 ... 1 2 2 1 1 2 1 1 process_time_diff (chain, draw) float64 0.000639 0.000246 ... 0.000227 diverging (chain, draw) bool False False False ... False False energy_error (chain, draw) float64 0.4592 -0.2174 ... 0.007773 0.1648 ... ... n_steps (chain, draw) float64 3.0 1.0 3.0 3.0 ... 3.0 1.0 1.0 step_size (chain, draw) float64 1.5 1.5 1.5 ... 1.012 1.012 1.012 max_energy_error (chain, draw) float64 0.7756 -0.2174 ... 0.007773 0.1648 step_size_bar (chain, draw) float64 1.283 1.283 1.283 ... 1.297 1.297 perf_counter_start (chain, draw) float64 28.29 28.3 28.3 ... 30.78 30.78 acceptance_rate (chain, draw) float64 0.6318 1.0 0.8711 ... 0.9923 0.848 Attributes: created_at: 2021-04-22T07:56:16.450183 arviz_version: 0.11.2 inference_library: pymc3 inference_library_version: 3.11.1 sampling_time: 11.031096935272217 tuning_steps: 1000
array([0, 1])
array([ 0, 1, 2, ..., 1997, 1998, 1999])
array([[5.20405057, 4.92368043, 5.00884066, ..., 4.04237257, 5.0305729 , 4.521664 ], [6.25346972, 4.01175133, 3.86637163, ..., 4.88592018, 3.91496743, 4.35769547]])
array([[0.00065294, 0.00024481, 0.00044267, ..., 0.00059054, 0.00065723, 0.00064661], [0.00043054, 0.00022931, 0.00052798, ..., 0.00043006, 0.00022755, 0.0002258 ]])
array([[2, 1, 2, ..., 2, 2, 2], [2, 1, 2, ..., 2, 1, 1]])
array([[0.000639, 0.000246, 0.000442, ..., 0.000591, 0.000657, 0.000647], [0.000431, 0.00023 , 0.000521, ..., 0.000431, 0.000229, 0.000227]])
array([[False, False, False, ..., False, False, False], [False, False, False, ..., False, False, False]])
array([[ 0.45920753, -0.21739965, -0.07875175, ..., -0.01860676, 0.46617228, -0.48798471], [-0.54387396, -0.11275521, 0.00778642, ..., -0.30208127, 0.00777273, 0.1648459 ]])
array([[-5.12528983, -4.22761221, -4.05957303, ..., -3.99826912, -4.95187299, -3.95434321], [-4.12407199, -3.83997964, -3.85823627, ..., -3.87394704, -3.89304216, -4.35415551]])
array([[3., 1., 3., ..., 3., 3., 3.], [3., 1., 3., ..., 3., 1., 1.]])
array([[1.49999444, 1.49999444, 1.49999444, ..., 1.49999444, 1.49999444, 1.49999444], [1.01210213, 1.01210213, 1.01210213, ..., 1.01210213, 1.01210213, 1.01210213]])
array([[ 0.77555538, -0.21739965, 0.26459767, ..., -0.09741403, 0.46617228, -0.48798471], [-1.05616668, -0.11275521, 0.00778642, ..., -0.30208127, 0.00777273, 0.1648459 ]])
array([[1.28313847, 1.28313847, 1.28313847, ..., 1.28313847, 1.28313847, 1.28313847], [1.29658253, 1.29658253, 1.29658253, ..., 1.29658253, 1.29658253, 1.29658253]])
array([[28.2942039 , 28.29500215, 28.29537313, ..., 29.52521431, 29.52594699, 29.52674653], [29.77944562, 29.77999329, 29.78039812, ..., 30.77637767, 30.77692196, 30.77726181]])
array([[0.63178411, 1. , 0.87107038, ..., 1. , 0.79510887, 1. ], [0.9837731 , 1. , 0.99647639, ..., 0.95530407, 0.9922574 , 0.84802437]])
<xarray.Dataset> Dimensions: (y_dim_0: 4) Coordinates: * y_dim_0 (y_dim_0) int64 0 1 2 3 Data variables: y (y_dim_0) int32 1 0 0 0 Attributes: created_at: 2021-04-22T07:56:16.684283 arviz_version: 0.11.2 inference_library: pymc3 inference_library_version: 3.11.1
array([0, 1, 2, 3])
array([1, 0, 0, 0], dtype=int32)
Using the InferenceData
object, we can leverage ArviZ to visualize our results.
ArviZ is a Python package for exploratory analysis of Bayesian models. Includes functions for posterior analysis, data storage, sample diagnostics, model checking, and comparison.
The goal is to provide backend-agnostic tools for diagnostics and visualizations of Bayesian inference in Python
We can plot the "trace" object using az.plot_trace
import arviz as az
az.plot_trace(trace)
array([[<AxesSubplot:title={'center':'𝜃'}>, <AxesSubplot:title={'center':'𝜃'}>]], dtype=object)
Or maybe we want a summary
table for the trace, similar to Pandas pd.DataFrame().describe()
az.summary(trace)
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
𝜃 | 0.33 | 0.181 | 0.016 | 0.642 | 0.005 | 0.003 | 1563.0 | 2202.0 | 1.0 |
Maybe we want to observe the aforementioned information in a different form, seen by arviZ's plot_posterior
of the trace object
az.plot_posterior(trace)
<AxesSubplot:title={'center':'𝜃'}>
So, we went through the following:
However, there is so much more to learn about PyMC3! In the meantime, you can learn more about PyMC3 via their documentation...
or wait for another exciting BAyPIGgies Meetup! (S/O to Jeff Fischer & team)