Probabilistic Graphical Models in R and python

# Probabilistic Graphical Models in <code>R</code> and <code>python</code>
## IV International Seminar on Statistics with R
### Bruna Wundervald
### May, 2019

---

# Outline

- Introduction
- Probabilistic (graphical) models 
  - What are PGMs?
  - Example
- Fitting a PGM
  - In `R`
  - In `python` 
- Resources

---

# Introduction

- Most tasks require a person or a system to *reason*

- We construct *models* of the system which we would like to reason about
  - Encode our knowledge about the system in a computer-readable form

- Reasoning about what is *probable*

---

##  Probabilistic Graphical Models

---

##  Probabilistic Graphical Models

## (?)

---

#  `$\underbrace{\text{Models}}_{}$`

- The models represent our understanding of events;  
- Declarative representation:
  - Each model has its own unique assumptions. 
- Can be deterministic (physical models) or involve random elements (statistical models)

---

# `$\underbrace{\text{Probabilistic}}_{}$`

- The models can not cover all elements related to the observed data:
  - Variability due to model limitation
  
- Probability accomodates the **uncertainty**:
  - Noisy observations 
  - Model assumptions

- The world is **inherently** stochastic: 
  - **Everything that involves probability is a model**
  
---

#  `$\underbrace{\text{Graphical}}_{}$`

- Graphs are a computer science tool used in many different areas; 
- Represent complex systems that have many variables; 
- The systems can be a result of the models we assume:
  - thousands or millions of parameters. 
  
<img src="img/graph.png" width="30%" height="30%" style="display: block; margin: auto;" />

--->  Representation, inference, and learning: critical components in intelligent systems
---

## Reference books

.pull-left[
<img src="img/book.png" width="90%" style="display: block; margin: auto;" />
] .pull-right[

<img src="img/bnlearn.png" width="90%" style="display: block; margin: auto;" />
]

---

## Example

- We want to investigate the usage patterns of different means of transport

---

## Example

.pull-left[
<img src="img/net.png" width="80%" style="display: block; margin: auto;" />
] .pull-right[

- **Age (A)**: young for individuals < 30 years old, adult for individuals
between 30 and 60 years, and old for people > 60.
- **Sex (S)**: male or female.
- **Education (E)**: up to high school or university degree.
- **Occupation (O)**: employee or self-employed.
- **Residence (R)**: the size of the city, as small or big.
- **Travel (T)**: the means of transport favored by the individual (car, train or other).
]

The **nature of the variables** suggests how they may be related to each other.

---

## Semantics

- A Bayesian Network is a directed acyclic graph (DAG) G whose nodes represent random variables `$X_1,\dots,X_p$`
- For each node `$X_i$`, there exists a conditional probability distribution `$P(X_i |Par_G(X_i))$` 
- The CPDs will denote the dependencies existing in the graph

- By the chain rule of probability, we can always represent a joint distribution
as follows, in any order:

`$$P(x_1:V) = p(x_1) p(x_2 | x_1) p(x_3 | x_2, x_1) \dots p(x_V | x_{1:V})$$`

being `$V$` the total number of variables. For the example, we have:

$$P(A, S, E, O, R, T) = P(A) P(S) P(E| A, S) P(O | E)  P(R | E) P(T | O, R) $$
$$ P(A, S, E, O, R, T) = P(A) P(S) P(E| A, S) P(O, R | E) P(T | O, R)$$

---

## Some CPDs (using the data)

.pull-left[
$$ P(A) = $$
<table class="table table-condensed table-hover" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:right;"> adult </th>
   <th style="text-align:right;"> old </th>
   <th style="text-align:right;"> young </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 0.472 </td>
   <td style="text-align:right;width: 4cm; "> 0.208 </td>
   <td style="text-align:right;"> 0.32 </td>
  </tr>
</tbody>
</table>

$$ P(S) = $$
<table class="table table-condensed table-hover" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:right;"> F </th>
   <th style="text-align:right;"> M </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 0.402 </td>
   <td style="text-align:right;width: 4cm; "> 0.598 </td>
  </tr>
</tbody>
</table>

].pull-right[

`$$P(E | S,  A) =$$`

<table class="table table-condensed table-hover" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> S </th>
   <th style="text-align:left;"> A </th>
   <th style="text-align:right;"> high </th>
   <th style="text-align:right;"> uni </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> F </td>
   <td style="text-align:left;width: 4cm; "> adult </td>
   <td style="text-align:right;"> 0.639 </td>
   <td style="text-align:right;"> 0.361 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> F </td>
   <td style="text-align:left;width: 4cm; "> old </td>
   <td style="text-align:right;"> 0.846 </td>
   <td style="text-align:right;"> 0.154 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> F </td>
   <td style="text-align:left;width: 4cm; "> young </td>
   <td style="text-align:right;"> 0.538 </td>
   <td style="text-align:right;"> 0.462 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> M </td>
   <td style="text-align:left;width: 4cm; "> adult </td>
   <td style="text-align:right;"> 0.719 </td>
   <td style="text-align:right;"> 0.281 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> M </td>
   <td style="text-align:left;width: 4cm; "> old </td>
   <td style="text-align:right;"> 0.892 </td>
   <td style="text-align:right;"> 0.108 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> M </td>
   <td style="text-align:left;width: 4cm; "> young </td>
   <td style="text-align:right;"> 0.811 </td>
   <td style="text-align:right;"> 0.189 </td>
  </tr>
</tbody>
</table>
]

---

## Training the BN in `R` 
- `bnlearn` package 
- We build the network structure first and then pass the data into the fit function

```r
library(bnlearn)
# Building the graph by hand 
survey.dag = empty.graph(nodes = c("A", "S", "E", "O", "R", "T"))
survey.dag = set.arc(survey.dag, from = "A", to = "E")
survey.dag = set.arc(survey.dag, from = "S", to = "E")
survey.dag = set.arc(survey.dag, from = "E", to = "O")
survey.dag = set.arc(survey.dag, from = "E", to = "R")
survey.dag = set.arc(survey.dag, from = "O", to = "T")
survey.dag = set.arc(survey.dag, from = "R", to = "T")
```

---

## Training the BN in `R`

```r
# OR, by model declaration syntax
survey.dag = model2network("[A][S][E|A:S][O|E][R|E][T|O:R]")

skeleton(survey.dag)
```

```
## 
##   Random/Generated Bayesian network
## 
##   model:
##     [undirected graph]
##   nodes:                                 6 
##   arcs:                                  6 
##     undirected arcs:                     6 
##     directed arcs:                       0 
##   average markov blanket size:           2.00 
##   average neighbourhood size:            2.00 
##   average branching factor:              0.00 
## 
##   generation algorithm:                  Empty
```

---

## Training the BN in `R`

The survey train data:
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> A </th>
   <th style="text-align:left;"> R </th>
   <th style="text-align:left;"> E </th>
   <th style="text-align:left;"> O </th>
   <th style="text-align:left;"> S </th>
   <th style="text-align:left;"> T </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> adult </td>
   <td style="text-align:left;"> big </td>
   <td style="text-align:left;"> high </td>
   <td style="text-align:left;"> emp </td>
   <td style="text-align:left;"> F </td>
   <td style="text-align:left;"> car </td>
  </tr>
  <tr>
   <td style="text-align:left;"> adult </td>
   <td style="text-align:left;"> small </td>
   <td style="text-align:left;"> uni </td>
   <td style="text-align:left;"> emp </td>
   <td style="text-align:left;"> M </td>
   <td style="text-align:left;"> car </td>
  </tr>
  <tr>
   <td style="text-align:left;"> adult </td>
   <td style="text-align:left;"> big </td>
   <td style="text-align:left;"> uni </td>
   <td style="text-align:left;"> emp </td>
   <td style="text-align:left;"> F </td>
   <td style="text-align:left;"> train </td>
  </tr>
</tbody>
</table>

Using the network structure:

```r
bn.mod <- bn.fit(survey.dag, data = train)
```

```
## 
##   Parameters of node A (multinomial distribution)
## 
## Conditional probability table:
##      adult       old     young 
## 0.4791667 0.2083333 0.3125000
```
 
---

```
## 
##   Parameters of node S (multinomial distribution)
## 
## Conditional probability table:
##          F         M 
## 0.4074074 0.5925926
```

```
## 
##   Parameters of node E (multinomial distribution)
## 
## Conditional probability table:
##  
## , , S = F
## 
##       A
## E          adult       old     young
##   high 0.6627907 0.8571429 0.5454545
##   uni  0.3372093 0.1428571 0.4545455
## 
## , , S = M
## 
##       A
## E          adult       old     young
##   high 0.7272727 0.8727273 0.8250000
##   uni  0.2727273 0.1272727 0.1750000
```

---

```
## 
##   Parameters of node O (multinomial distribution)
## 
## Conditional probability table:
##  
##       E
## O            high        uni
##   emp  0.98746082 0.92920354
##   self 0.01253918 0.07079646
```

```
## 
##   Parameters of node R (multinomial distribution)
## 
## Conditional probability table:
##  
##        E
## R            high       uni
##   big   0.7147335 0.8672566
##   small 0.2852665 0.1327434
```

---

```
## 
##   Parameters of node T (multinomial distribution)
## 
## Conditional probability table:
##  
## , , R = big
## 
##        O
## T              emp       self
##   car   0.56645570 0.60000000
##   other 0.21518987 0.20000000
##   train 0.21835443 0.20000000
## 
## , , R = small
## 
##        O
## T              emp       self
##   car   0.56730769 0.50000000
##   other 0.06730769 0.50000000
##   train 0.36538462 0.00000000
```

---
## BN predictions for the test set

```r
pred <- predict(bn.mod, "T", test) 
scales::percent(sum(pred == test$T)/nrow(test)) # Accuracy
```

```
## [1] "66.2%"
```

## Random Forest comparison

```r
rf <- randomForest::randomForest(T ~ . , data = train)
scales::percent(sum(predict(rf, test) ==  test$T)/nrow(test)) # Accuracy
```

```
## [1] "64.7%"
```

---

## Training the BN in `python` 
 
 - `pgmpy` package

```python
from pgmpy.models import BayesianModel
from pgmpy.estimators import MaximumLikelihoodEstimator, BayesianEstimator
```

```python

model = BayesianModel([('A', 'E'), ('S', 'E'), ('E', 'O'), ('E', 'R'),('R', 'T'), ('O', 'T')])
model.fit(train, estimator = MaximumLikelihoodEstimator)
print(model.get_cpds('A'))
```

```
## ╒══════════╤══════════╕
## │ A(adult) │ 0.479167 │
## ├──────────┼──────────┤
## │ A(old)   │ 0.208333 │
## ├──────────┼──────────┤
## │ A(young) │ 0.3125   │
## ╘══════════╧══════════╛
```

---

```python
print(model.get_cpds('T'))
```

```
## ╒══════════╤═════════════════════╤═════════════════════╤═════════╤══════════╕
## │ O        │ O(emp)              │ O(emp)              │ O(self) │ O(self)  │
## ├──────────┼─────────────────────┼─────────────────────┼─────────┼──────────┤
## │ R        │ R(big)              │ R(small)            │ R(big)  │ R(small) │
## ├──────────┼─────────────────────┼─────────────────────┼─────────┼──────────┤
## │ T(car)   │ 0.5664556962025317  │ 0.5673076923076923  │ 0.6     │ 0.5      │
## ├──────────┼─────────────────────┼─────────────────────┼─────────┼──────────┤
## │ T(other) │ 0.21518987341772153 │ 0.0673076923076923  │ 0.2     │ 0.5      │
## ├──────────┼─────────────────────┼─────────────────────┼─────────┼──────────┤
## │ T(train) │ 0.21835443037974683 │ 0.36538461538461536 │ 0.2     │ 0.0      │
## ╘══════════╧═════════════════════╧═════════════════════╧═════════╧══════════╛
```

```python

## BN predictions for the test set
test_response = test['T'] 
pred = model.predict(test.drop(labels = 'T',  axis=1))

df = pd.DataFrame({'pred' : pred['T'], 'test' : test_response} )

'{:.1%}'.format(df[df.pred == df.test].shape[0]/test.shape[0])
```

```
## '66.2%'
```

---

## BN details
  - Way faster algorithm
    - The simplest version just uses the CPDs
  - It can accommodate millions of variables and parameters

##  Why do we need PGMs?

- Every probabilistic model can be represented as a graph. That includes:
  - Linear and generalized linear models 
  - Probabilistic clustering 
  - Probabilistic PCA
  - Topic Modelling
  - Hierarchical Models 
  - Latent Variable Models in general
  - Markov Chains
  - Bayesian Neural Networks
  - `$\infty$`

---

## Some real-life examples

- **Medical diagnosis**: which characteristics of a patient influence in the presence of disease?
- **Spam filter**: how likely is an email to be spam given its content?

- **Google Page Rank**: what are the more relevant pages in a Google search?

- **Recommendation systems (Netflix, Amazon, Facebook)**: which products the clients are more likely to consume?

- **Presidential polls**: in who are the voters more likely to vote for?

---

# Resources

> https://github.com/search?p=2&q=probabilistic+models&type=Repositories

.pull-left[
<img src="img/github.png" width="70%" style="display: block; margin: auto;" />
] .pull-right[

- `bnlearn`: http://www.bnlearn.com/book-crc/
  - data & tutorials  
- CRAN task `GR`: gRaphical Models in R
  - Representation & Modelling
  - https://cran.r-project.org/web/views/gR.html
  
- `pgmpy` page: http://pgmpy.org/

- many, **many** more packages for both `R` and `python`
]

---
class: center, middle

## Acknowledgments

This work was supported by a Science Foundation Ireland Career Development Award grant number: 17/CDA/4695

---

# References

<p><cite>Koller, D. and N. Friedman
(2009).
<em>Probabilistic graphical models: principles and techniques</em>.
MIT press.</cite></p>

<p><cite>Murphy, K. P.
(2012).
<em>Machine learning: a probabilistic perspective</em>.
MIT press.</cite></p>

<p><cite>Scutari, M.
(2009).
&ldquo;Learning Bayesian networks with the bnlearn R package&rdquo;.
In: <em>arXiv preprint arXiv:0908.3817</em>.</cite></p>

---
class: bottom, center, inverse

<font size="30">Thanks! </font>

<b>

<color="FFFFFF">  https://github.com/brunaw </color>