Introduction to
¶

PyMC3

Raul Maldonado, April 2021 BayPiggies

A little bit about me
¶

  • Data Analyst @ Autodesk, Teaching Fellow @ Delta Analytics
  • You can find me at RaulingAverage.dev
  • Enjoy Coffee, Learning, and Running..near the beach

Side remarks
¶

This presentation does not reflect any workings or material from Autodesk. Also, I am not a core-contributor to the PyMC3 project, but a user.

Also, as we continue & recover from the great Panini, please consider helping or contributing to a social causes, nonprofits, or sponsoring Women & BIPOC applicants at your company. #PayItForward

What will we be talking about today?
PyMC3!

But before going into PyMC3, we will briefly cover Bayesian methodology & Bayesian Modeling.

Particularly:

  • What is Bayesian Modeling?
  • Brief Introduction to Probability
  • Getting Started with PyMC3

What is Bayesian Modeling?¶

The Bay (SF)

Where is Bae?

  • Frequentist Statistics is a method of statistical inference in which conclusions from data is obtained by emphasizing the frequency or proportion of the data. The resulting information is a distribution for some event. The resulting calculations are a single value, point estimate.
  • Bayesian Statistics is a method of statistical inference in which Bayes’ theorem is used to update the probability for a hypothesis as more evidence or information becomes available. The resulting calculations are a distribution for some event.

    Leveraging Bayesian Statistics, we can leverage probabilistics programming to create Bayesian (probabilistic) models to solve problems requiring estimated probabilities from resulting probability distributions.

Brief Introduction to Probability¶

A few core principles of Probabilities:

  • are in the interval [0, 1]
  • follow the product rule $p(A,B) = p(A|B)p(B)$
    • $p(A, B)$ expression represents the joint probability of A and B
    • $p(A|B)$ expression is used to indicate a conditional probability of A and b
  • are Independent when $p(A) = p(B)$

Bayes' Theorem

$$p(\theta | y) = \tfrac{p(y|\theta)p(\theta)}{p(y)}$$

where

  • $p(\theta)$: Prior (hypothesis)
  • $p(y | \theta)$: Likelihood (data)
  • $p(\theta |y)$: Posterior
  • $p(y)$: Marginal Likelihood (evidence)

Coin-toss Example¶

Let's say we have a coin-toss scenario with no understanding of what expected probability.

We want to understand what is the probability that the coin-toss being heads will typically be.

There are several scenarios with a total number of trials $N$.

N= [0,1,2,3,4,8,16,32,50,150]

[0, 1, 2, 3, 4, 8, 16, 32, 50, 150]

Respectively for these $N$ trials, we have a prior probability $p(\theta)$ = 0.35, 35% chance of being heads.

And the likelihood $p(y|\theta)$ of that scenario as well.

Not yet leveraging PyMC3, we use scipy & matplotlib to respectively calculate & visualize our distributions of what the probability of coin-toss being heads would typically be, from an outputted distribution

Data:  [0, 1, 1, 1, 1, 4, 6, 9, 13, 48]
𝜃_real:  0.35
Text(0.5, 0.98, 'Coin Toss Scenarios for N Trials')

For the first subplot of 0 trials:

  • The uniform (blue) prior represents all the possigble values for bias being equally probable a priori.
  • The gaussian-like (orange) prior is centered and concentrated around 0.5
    • So this prior is compatible with information indicating that the coin has more or less about the same chance of landing heads or tails.
    • We could also say this prior is compatible with the belief that most coins are fair.
  • The skewed (green) prior puts the most weight on the tail-based outcome

Already, this was enough to sufficiently get a gloss of probabilities. To learn more about Bayesian Statisitcs, we could continue.

Or, as textbook authors say,

"we leave that up to the reader"

Pain

Just kidding, we would like to proceed with an actual implemenation of the PyMC3 library.

The Main Takeway - Bayes Theorem¶

Bayes' Theorem

$$p(\theta | y) = \tfrac{p(y|\theta)p(\theta)}{p(y)}$$

where

  • $p(\theta)$: Prior
  • $p(y | \theta)$: Likelihood
  • $p(\theta |y)$: Posterior
  • $p(y)$: Marginal Likelihood (evidence)

Getting Started with PyMC3¶

Why hasn't Bayesian Modeling been an industry-wide recommendation?¶

Similar to the growth of Machine Learning & Artificial Intelligence, Probabilistic Programming, implementation of Bayesian Modeling, had a combination of innovation & limitations due to the following:

  • computationally challenging
  • conceptually challenging to work
  • Algorithms advancements like the Metropolis or No-U-Turn Sampler are more recently relevant to for the advancement of the field itself

However, as data technology has become more advanced in the recent years, we can now leverage the opportunity to implement Probabilistic Programming.

What is PyMC3?¶

Thinking

PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning. It's associated to one of several Probabilistic Programming Languages (PPL) which focuses on advanced Markov chain Monte Carlo (MCMC) and variational fitting algorithms, and more.

Some features of PyMC3 are the following:

  • Intuitive model specification syntax
  • Leverage several (Sampling) Algorithms
    • Hamiltonian Monte Carlo Markov Chain
    • No U-Turn Sampler (NUTS)
    • Variational Inference
  • Built on top of Theano

    • Theano features
    • Leverage dynamic C compilation for optimized computation
    • Automatic Differentiation

      Note: As of 2021, Theano is now "Aesara". See "The Future of PyMC3, or: Theano is Dead, Long Live Theano " for historical background & next steps

  • And more!

PyMC3 Example¶

Using the coin-toss example, let's say we have flip a coin 4 times.

We have some intuition that the probability from coin tosses are 0.35 and we have some idea of what the probability of the coin tosses would be.

We would like to see the probability of flipping the coin with our prior assumption and 4 attempts.

trials = 4

𝜃_real = 0.35
data = stats.bernoulli.rvs(p=𝜃_real, size =trials)
Trials:  4
Bias (𝜃):  0.35
Data:  [1 0 0 0]

To go about this, we would be leveraging PyMC3 in the following procedure:

  1. Instantiate (model) Object
  2. Inference
    • Calculate output distribution
  3. Predict & Evaluate

1. Instantiate Object¶

import pymc3 as pm

with pm.Model() as coin_flip_model:
    𝜃 = pm.Beta('𝜃', alpha=1, beta=1)
    y = pm.Bernoulli('y', p = 𝜃, observed=data)

2.Inference¶

We now perform inference to approximate the posterior distribution.

The options for this inference is either by:

  • sampling
  • variational inference
with coin_flip_model:
    trace = pm.sample(2_000, random_seed=123, return_inferencedata=True)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
/Users/CloudChaoszero/opt/anaconda3/envs/general/lib/python3.7/site-packages/aesara/graph/fg.py:501: UserWarning: Variable Elemwise{Switch}.0 cannot be replaced; it isn't in the FunctionGraph
  f"Variable {var} cannot be replaced; it isn't in the FunctionGraph"
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [𝜃]

 100.00% [6000/6000 00:04<00:00 Sampling 2 chains, 0 divergences]
Sampling 2 chains for 1_000 tune and 2_000 draw iterations (2_000 + 4_000 draws total) took 19 seconds.

or....

with pm.Model() as coin_flip_model:
    𝜃 = pm.Beta('𝜃', alpha=1, beta=1) # Prior
    y = pm.Bernoulli('y', p = 𝜃, observed=data) # Likelihood
    trace = pm.sample(2_000, random_seed=123, return_inferencedata=True) #Draw Samples

We just did the following:

  • Instantiate a model called coin_flip_model
  • Set Prior (theta) as a Beta Distribution
  • Set the Likelihood (y) as a Bernoulli Distribution
  • Lastly, use PyMC3's pm.sample function to create a trace with 2,000 drawn samples from the posterior distribution

Now, for PyMC3 versions 3.9+, we just saw the parameter return_inferencedata=True to return an arviz.InferenceData object, instead of a default MultiTrace object.

trace = pm.sample(draws = 20_000, return_inferencedata=True)

InferenceData has many advantages, compared to a MultiTrace: For example it can be saved/loaded from a file, and can also carry additional (meta)data such as date/version, or posterior predictive distributions.

trace
arviz.InferenceData
    • <xarray.Dataset>
      Dimensions:  (chain: 2, draw: 2000)
      Coordinates:
        * chain    (chain) int64 0 1
        * draw     (draw) int64 0 1 2 3 4 5 6 7 ... 1993 1994 1995 1996 1997 1998 1999
      Data variables:
          𝜃        (chain, draw) float64 0.09391 0.1793 0.211 ... 0.2719 0.2625 0.1611
      Attributes:
          created_at:                 2021-04-22T07:56:16.445509
          arviz_version:              0.11.2
          inference_library:          pymc3
          inference_library_version:  3.11.1
          sampling_time:              11.031096935272217
          tuning_steps:               1000
      xarray.Dataset
        • chain: 2
        • draw: 2000
        • chain
          (chain)
          int64
          0 1
          array([0, 1])
        • draw
          (draw)
          int64
          0 1 2 3 4 ... 1996 1997 1998 1999
          array([   0,    1,    2, ..., 1997, 1998, 1999])
        • 𝜃
          (chain, draw)
          float64
          0.09391 0.1793 ... 0.2625 0.1611
          array([[0.0939108 , 0.17932678, 0.21104018, ..., 0.45345366, 0.63655435,
                  0.43731458],
                 [0.19751072, 0.29482871, 0.28109058, ..., 0.27187977, 0.26246493,
                  0.16109471]])
      • created_at :
        2021-04-22T07:56:16.445509
        arviz_version :
        0.11.2
        inference_library :
        pymc3
        inference_library_version :
        3.11.1
        sampling_time :
        11.031096935272217
        tuning_steps :
        1000

    • <xarray.Dataset>
      Dimensions:  (chain: 2, draw: 2000, y_dim_0: 4)
      Coordinates:
        * chain    (chain) int64 0 1
        * draw     (draw) int64 0 1 2 3 4 5 6 7 ... 1993 1994 1995 1996 1997 1998 1999
        * y_dim_0  (y_dim_0) int64 0 1 2 3
      Data variables:
          y        (chain, draw, y_dim_0) float64 -2.365 -0.09862 ... -0.1757 -0.1757
      Attributes:
          created_at:                 2021-04-22T07:56:16.683197
          arviz_version:              0.11.2
          inference_library:          pymc3
          inference_library_version:  3.11.1
      xarray.Dataset
        • chain: 2
        • draw: 2000
        • y_dim_0: 4
        • chain
          (chain)
          int64
          0 1
          array([0, 1])
        • draw
          (draw)
          int64
          0 1 2 3 4 ... 1996 1997 1998 1999
          array([   0,    1,    2, ..., 1997, 1998, 1999])
        • y_dim_0
          (y_dim_0)
          int64
          0 1 2 3
          array([0, 1, 2, 3])
        • y
          (chain, draw, y_dim_0)
          float64
          -2.365 -0.09862 ... -0.1757 -0.1757
          array([[[-2.36540986, -0.09861753, -0.09861753, -0.09861753],
                  [-1.71854556, -0.19763027, -0.19763027, -0.19763027],
                  [-1.55570676, -0.23703988, -0.23703988, -0.23703988],
                  ...,
                  [-0.7908622 , -0.60413618, -0.60413618, -0.60413618],
                  [-0.45168548, -1.01212551, -1.01212551, -1.01212551],
                  [-0.82710249, -0.57503456, -0.57503456, -0.57503456]],
          
                 [[-1.62196244, -0.22003678, -0.22003678, -0.22003678],
                  [-1.22136074, -0.34931454, -0.34931454, -0.34931454],
                  [-1.26907832, -0.33001991, -0.33001991, -0.33001991],
                  ...,
                  [-1.30239534, -0.31728909, -0.31728909, -0.31728909],
                  [-1.33763779, -0.30444165, -0.30444165, -0.30444165],
                  [-1.82576283, -0.17565746, -0.17565746, -0.17565746]]])
      • created_at :
        2021-04-22T07:56:16.683197
        arviz_version :
        0.11.2
        inference_library :
        pymc3
        inference_library_version :
        3.11.1

    • <xarray.Dataset>
      Dimensions:             (chain: 2, draw: 2000)
      Coordinates:
        * chain               (chain) int64 0 1
        * draw                (draw) int64 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999
      Data variables: (12/13)
          energy              (chain, draw) float64 5.204 4.924 5.009 ... 3.915 4.358
          perf_counter_diff   (chain, draw) float64 0.0006529 0.0002448 ... 0.0002258
          tree_depth          (chain, draw) int64 2 1 2 2 2 1 2 2 ... 1 2 2 1 1 2 1 1
          process_time_diff   (chain, draw) float64 0.000639 0.000246 ... 0.000227
          diverging           (chain, draw) bool False False False ... False False
          energy_error        (chain, draw) float64 0.4592 -0.2174 ... 0.007773 0.1648
          ...                  ...
          n_steps             (chain, draw) float64 3.0 1.0 3.0 3.0 ... 3.0 1.0 1.0
          step_size           (chain, draw) float64 1.5 1.5 1.5 ... 1.012 1.012 1.012
          max_energy_error    (chain, draw) float64 0.7756 -0.2174 ... 0.007773 0.1648
          step_size_bar       (chain, draw) float64 1.283 1.283 1.283 ... 1.297 1.297
          perf_counter_start  (chain, draw) float64 28.29 28.3 28.3 ... 30.78 30.78
          acceptance_rate     (chain, draw) float64 0.6318 1.0 0.8711 ... 0.9923 0.848
      Attributes:
          created_at:                 2021-04-22T07:56:16.450183
          arviz_version:              0.11.2
          inference_library:          pymc3
          inference_library_version:  3.11.1
          sampling_time:              11.031096935272217
          tuning_steps:               1000
      xarray.Dataset
        • chain: 2
        • draw: 2000
        • chain
          (chain)
          int64
          0 1
          array([0, 1])
        • draw
          (draw)
          int64
          0 1 2 3 4 ... 1996 1997 1998 1999
          array([   0,    1,    2, ..., 1997, 1998, 1999])
        • energy
          (chain, draw)
          float64
          5.204 4.924 5.009 ... 3.915 4.358
          array([[5.20405057, 4.92368043, 5.00884066, ..., 4.04237257, 5.0305729 ,
                  4.521664  ],
                 [6.25346972, 4.01175133, 3.86637163, ..., 4.88592018, 3.91496743,
                  4.35769547]])
        • perf_counter_diff
          (chain, draw)
          float64
          0.0006529 0.0002448 ... 0.0002258
          array([[0.00065294, 0.00024481, 0.00044267, ..., 0.00059054, 0.00065723,
                  0.00064661],
                 [0.00043054, 0.00022931, 0.00052798, ..., 0.00043006, 0.00022755,
                  0.0002258 ]])
        • tree_depth
          (chain, draw)
          int64
          2 1 2 2 2 1 2 2 ... 1 2 2 1 1 2 1 1
          array([[2, 1, 2, ..., 2, 2, 2],
                 [2, 1, 2, ..., 2, 1, 1]])
        • process_time_diff
          (chain, draw)
          float64
          0.000639 0.000246 ... 0.000227
          array([[0.000639, 0.000246, 0.000442, ..., 0.000591, 0.000657, 0.000647],
                 [0.000431, 0.00023 , 0.000521, ..., 0.000431, 0.000229, 0.000227]])
        • diverging
          (chain, draw)
          bool
          False False False ... False False
          array([[False, False, False, ..., False, False, False],
                 [False, False, False, ..., False, False, False]])
        • energy_error
          (chain, draw)
          float64
          0.4592 -0.2174 ... 0.007773 0.1648
          array([[ 0.45920753, -0.21739965, -0.07875175, ..., -0.01860676,
                   0.46617228, -0.48798471],
                 [-0.54387396, -0.11275521,  0.00778642, ..., -0.30208127,
                   0.00777273,  0.1648459 ]])
        • lp
          (chain, draw)
          float64
          -5.125 -4.228 ... -3.893 -4.354
          array([[-5.12528983, -4.22761221, -4.05957303, ..., -3.99826912,
                  -4.95187299, -3.95434321],
                 [-4.12407199, -3.83997964, -3.85823627, ..., -3.87394704,
                  -3.89304216, -4.35415551]])
        • n_steps
          (chain, draw)
          float64
          3.0 1.0 3.0 3.0 ... 1.0 3.0 1.0 1.0
          array([[3., 1., 3., ..., 3., 3., 3.],
                 [3., 1., 3., ..., 3., 1., 1.]])
        • step_size
          (chain, draw)
          float64
          1.5 1.5 1.5 ... 1.012 1.012 1.012
          array([[1.49999444, 1.49999444, 1.49999444, ..., 1.49999444, 1.49999444,
                  1.49999444],
                 [1.01210213, 1.01210213, 1.01210213, ..., 1.01210213, 1.01210213,
                  1.01210213]])
        • max_energy_error
          (chain, draw)
          float64
          0.7756 -0.2174 ... 0.007773 0.1648
          array([[ 0.77555538, -0.21739965,  0.26459767, ..., -0.09741403,
                   0.46617228, -0.48798471],
                 [-1.05616668, -0.11275521,  0.00778642, ..., -0.30208127,
                   0.00777273,  0.1648459 ]])
        • step_size_bar
          (chain, draw)
          float64
          1.283 1.283 1.283 ... 1.297 1.297
          array([[1.28313847, 1.28313847, 1.28313847, ..., 1.28313847, 1.28313847,
                  1.28313847],
                 [1.29658253, 1.29658253, 1.29658253, ..., 1.29658253, 1.29658253,
                  1.29658253]])
        • perf_counter_start
          (chain, draw)
          float64
          28.29 28.3 28.3 ... 30.78 30.78
          array([[28.2942039 , 28.29500215, 28.29537313, ..., 29.52521431,
                  29.52594699, 29.52674653],
                 [29.77944562, 29.77999329, 29.78039812, ..., 30.77637767,
                  30.77692196, 30.77726181]])
        • acceptance_rate
          (chain, draw)
          float64
          0.6318 1.0 0.8711 ... 0.9923 0.848
          array([[0.63178411, 1.        , 0.87107038, ..., 1.        , 0.79510887,
                  1.        ],
                 [0.9837731 , 1.        , 0.99647639, ..., 0.95530407, 0.9922574 ,
                  0.84802437]])
      • created_at :
        2021-04-22T07:56:16.450183
        arviz_version :
        0.11.2
        inference_library :
        pymc3
        inference_library_version :
        3.11.1
        sampling_time :
        11.031096935272217
        tuning_steps :
        1000

    • <xarray.Dataset>
      Dimensions:  (y_dim_0: 4)
      Coordinates:
        * y_dim_0  (y_dim_0) int64 0 1 2 3
      Data variables:
          y        (y_dim_0) int32 1 0 0 0
      Attributes:
          created_at:                 2021-04-22T07:56:16.684283
          arviz_version:              0.11.2
          inference_library:          pymc3
          inference_library_version:  3.11.1
      xarray.Dataset
        • y_dim_0: 4
        • y_dim_0
          (y_dim_0)
          int64
          0 1 2 3
          array([0, 1, 2, 3])
        • y
          (y_dim_0)
          int32
          1 0 0 0
          array([1, 0, 0, 0], dtype=int32)
      • created_at :
        2021-04-22T07:56:16.684283
        arviz_version :
        0.11.2
        inference_library :
        pymc3
        inference_library_version :
        3.11.1

3. Predict & EvaluatePredict & Evaluate¶

Visualizing with ArviZ¶

ArviZ Logo

Using the InferenceData object, we can leverage ArviZ to visualize our results.

ArviZ is a Python package for exploratory analysis of Bayesian models. Includes functions for posterior analysis, data storage, sample diagnostics, model checking, and comparison.

The goal is to provide backend-agnostic tools for diagnostics and visualizations of Bayesian inference in Python

We can plot the "trace" object using az.plot_trace

import arviz as az
az.plot_trace(trace)
array([[<AxesSubplot:title={'center':'𝜃'}>,
        <AxesSubplot:title={'center':'𝜃'}>]], dtype=object)

Or maybe we want a summary table for the trace, similar to Pandas pd.DataFrame().describe()

az.summary(trace)
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
𝜃 0.33 0.181 0.016 0.642 0.005 0.003 1563.0 2202.0 1.0

Maybe we want to observe the aforementioned information in a different form, seen by arviZ's plot_posterior of the trace object

az.plot_posterior(trace)
<AxesSubplot:title={'center':'𝜃'}>

A summary¶

So, we went through the following:

  • What is Bayesian Modeling?
  • Brief Introduction to Probability
  • Getting Started with PyMC3

However, there is so much more to learn about PyMC3! In the meantime, you can learn more about PyMC3 via their documentation...

or wait for another exciting BAyPIGgies Meetup! (S/O to Jeff Fischer & team)

With that being said
¶

Thank you,
¶

BayPiggies

  • RaulingAverage.dev
  • Twitter: @RaulingAverage
  • Notebook @ CloudChaoszero/Presentations/BayPiggies/Survival-Analysis

Resources¶

  • Introduction to PyMC3 by Tung N. Nguyen
  • PyMC3 Documentation
  • Arviz Documentation
  • For consulting services: There is a consultancy group called PyMC3 Labs
    • PyMC3 labs to help answer tough Business questions that require leveraging Bayesian Modeling.