class: center, middle, inverse, title-slide # Probabilistic Graphical Models in
R
and
python
## IV International Seminar on Statistics with R ### Bruna Wundervald ### May, 2019 --- class: middle # Outline - Introduction - Probabilistic (graphical) models - What are PGMs? - Example - Fitting a PGM - In `R` - In `python` - Resources --- # Introduction - Most tasks require a person or a system to *reason* - We construct *models* of the system which we would like to reason about - Encode our knowledge about the system in a computer-readable form - Reasoning about what is *probable* <img src="img/calvin.jpg" width="30%" height="30%" style="display: block; margin: auto;" /> --- class: center, middle, inverse ## Probabilistic Graphical Models --- class: center, middle, inverse ## Probabilistic Graphical Models ## (?) --- # `\(\underbrace{\text{Models}}_{}\)` - The models represent our understanding of events; - Declarative representation: - Each model has its own unique assumptions. - Can be deterministic (physical models) or involve random elements (statistical models) --- # `\(\underbrace{\text{Probabilistic}}_{}\)` - The models can not cover all elements related to the observed data: - Variability due to model limitation - Probability accomodates the **uncertainty**: - Noisy observations - Model assumptions - The world is **inherently** stochastic: - **Everything that involves probability is a model** --- # `\(\underbrace{\text{Graphical}}_{}\)` - Graphs are a computer science tool used in many different areas; - Represent complex systems that have many variables; - The systems can be a result of the models we assume: - thousands or millions of parameters. <img src="img/graph.png" width="30%" height="30%" style="display: block; margin: auto;" /> ---> Representation, inference, and learning: critical components in intelligent systems --- class: middle, center ## Reference books .pull-left[ <img src="img/book.png" width="90%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="img/bnlearn.png" width="90%" style="display: block; margin: auto;" /> ] --- ## Example - We want to investigate the usage patterns of different means of transport <img src="img/transport.jpg" width="80%" style="display: block; margin: auto;" /> --- ## Example .pull-left[ <img src="img/net.png" width="80%" style="display: block; margin: auto;" /> ] .pull-right[ - **Age (A)**: young for individuals < 30 years old, adult for individuals between 30 and 60 years, and old for people > 60. - **Sex (S)**: male or female. - **Education (E)**: up to high school or university degree. - **Occupation (O)**: employee or self-employed. - **Residence (R)**: the size of the city, as small or big. - **Travel (T)**: the means of transport favored by the individual (car, train or other). ] The **nature of the variables** suggests how they may be related to each other. --- ## Semantics - A Bayesian Network is a directed acyclic graph (DAG) G whose nodes represent random variables `\(X_1,\dots,X_p\)` - For each node `\(X_i\)`, there exists a conditional probability distribution `\(P(X_i |Par_G(X_i))\)` - The CPDs will denote the dependencies existing in the graph - By the chain rule of probability, we can always represent a joint distribution as follows, in any order: `$$P(x_1:V) = p(x_1) p(x_2 | x_1) p(x_3 | x_2, x_1) \dots p(x_V | x_{1:V})$$` being `\(V\)` the total number of variables. For the example, we have: $$P(A, S, E, O, R, T) = P(A) P(S) P(E| A, S) P(O | E) P(R | E) P(T | O, R) $$ $$ P(A, S, E, O, R, T) = P(A) P(S) P(E| A, S) P(O, R | E) P(T | O, R)$$ --- ## Some CPDs (using the data) .pull-left[ $$ P(A) = $$ <table class="table table-condensed table-hover" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> adult </th> <th style="text-align:right;"> old </th> <th style="text-align:right;"> young </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0.472 </td> <td style="text-align:right;width: 4cm; "> 0.208 </td> <td style="text-align:right;"> 0.32 </td> </tr> </tbody> </table> $$ P(S) = $$ <table class="table table-condensed table-hover" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> F </th> <th style="text-align:right;"> M </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0.402 </td> <td style="text-align:right;width: 4cm; "> 0.598 </td> </tr> </tbody> </table> ].pull-right[ `$$P(E | S, A) =$$` <table class="table table-condensed table-hover" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> S </th> <th style="text-align:left;"> A </th> <th style="text-align:right;"> high </th> <th style="text-align:right;"> uni </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> F </td> <td style="text-align:left;width: 4cm; "> adult </td> <td style="text-align:right;"> 0.639 </td> <td style="text-align:right;"> 0.361 </td> </tr> <tr> <td style="text-align:left;"> F </td> <td style="text-align:left;width: 4cm; "> old </td> <td style="text-align:right;"> 0.846 </td> <td style="text-align:right;"> 0.154 </td> </tr> <tr> <td style="text-align:left;"> F </td> <td style="text-align:left;width: 4cm; "> young </td> <td style="text-align:right;"> 0.538 </td> <td style="text-align:right;"> 0.462 </td> </tr> <tr> <td style="text-align:left;"> M </td> <td style="text-align:left;width: 4cm; "> adult </td> <td style="text-align:right;"> 0.719 </td> <td style="text-align:right;"> 0.281 </td> </tr> <tr> <td style="text-align:left;"> M </td> <td style="text-align:left;width: 4cm; "> old </td> <td style="text-align:right;"> 0.892 </td> <td style="text-align:right;"> 0.108 </td> </tr> <tr> <td style="text-align:left;"> M </td> <td style="text-align:left;width: 4cm; "> young </td> <td style="text-align:right;"> 0.811 </td> <td style="text-align:right;"> 0.189 </td> </tr> </tbody> </table> ] --- ## Training the BN in `R` - `bnlearn` package - We build the network structure first and then pass the data into the fit function ```r library(bnlearn) # Building the graph by hand survey.dag = empty.graph(nodes = c("A", "S", "E", "O", "R", "T")) survey.dag = set.arc(survey.dag, from = "A", to = "E") survey.dag = set.arc(survey.dag, from = "S", to = "E") survey.dag = set.arc(survey.dag, from = "E", to = "O") survey.dag = set.arc(survey.dag, from = "E", to = "R") survey.dag = set.arc(survey.dag, from = "O", to = "T") survey.dag = set.arc(survey.dag, from = "R", to = "T") ``` --- ## Training the BN in `R` ```r # OR, by model declaration syntax survey.dag = model2network("[A][S][E|A:S][O|E][R|E][T|O:R]") skeleton(survey.dag) ``` ``` ## ## Random/Generated Bayesian network ## ## model: ## [undirected graph] ## nodes: 6 ## arcs: 6 ## undirected arcs: 6 ## directed arcs: 0 ## average markov blanket size: 2.00 ## average neighbourhood size: 2.00 ## average branching factor: 0.00 ## ## generation algorithm: Empty ``` --- ## Training the BN in `R` The survey train data: <table> <thead> <tr> <th style="text-align:left;"> A </th> <th style="text-align:left;"> R </th> <th style="text-align:left;"> E </th> <th style="text-align:left;"> O </th> <th style="text-align:left;"> S </th> <th style="text-align:left;"> T </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> adult </td> <td style="text-align:left;"> big </td> <td style="text-align:left;"> high </td> <td style="text-align:left;"> emp </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> car </td> </tr> <tr> <td style="text-align:left;"> adult </td> <td style="text-align:left;"> small </td> <td style="text-align:left;"> uni </td> <td style="text-align:left;"> emp </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> car </td> </tr> <tr> <td style="text-align:left;"> adult </td> <td style="text-align:left;"> big </td> <td style="text-align:left;"> uni </td> <td style="text-align:left;"> emp </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> train </td> </tr> </tbody> </table> Using the network structure: ```r bn.mod <- bn.fit(survey.dag, data = train) ``` ``` ## ## Parameters of node A (multinomial distribution) ## ## Conditional probability table: ## adult old young ## 0.4791667 0.2083333 0.3125000 ``` --- ``` ## ## Parameters of node S (multinomial distribution) ## ## Conditional probability table: ## F M ## 0.4074074 0.5925926 ``` ``` ## ## Parameters of node E (multinomial distribution) ## ## Conditional probability table: ## ## , , S = F ## ## A ## E adult old young ## high 0.6627907 0.8571429 0.5454545 ## uni 0.3372093 0.1428571 0.4545455 ## ## , , S = M ## ## A ## E adult old young ## high 0.7272727 0.8727273 0.8250000 ## uni 0.2727273 0.1272727 0.1750000 ``` --- ``` ## ## Parameters of node O (multinomial distribution) ## ## Conditional probability table: ## ## E ## O high uni ## emp 0.98746082 0.92920354 ## self 0.01253918 0.07079646 ``` ``` ## ## Parameters of node R (multinomial distribution) ## ## Conditional probability table: ## ## E ## R high uni ## big 0.7147335 0.8672566 ## small 0.2852665 0.1327434 ``` --- ``` ## ## Parameters of node T (multinomial distribution) ## ## Conditional probability table: ## ## , , R = big ## ## O ## T emp self ## car 0.56645570 0.60000000 ## other 0.21518987 0.20000000 ## train 0.21835443 0.20000000 ## ## , , R = small ## ## O ## T emp self ## car 0.56730769 0.50000000 ## other 0.06730769 0.50000000 ## train 0.36538462 0.00000000 ``` --- ## BN predictions for the test set ```r pred <- predict(bn.mod, "T", test) scales::percent(sum(pred == test$T)/nrow(test)) # Accuracy ``` ``` ## [1] "66.2%" ``` ## Random Forest comparison ```r rf <- randomForest::randomForest(T ~ . , data = train) scales::percent(sum(predict(rf, test) == test$T)/nrow(test)) # Accuracy ``` ``` ## [1] "64.7%" ``` --- ## Training the BN in `python` - `pgmpy` package ```python from pgmpy.models import BayesianModel from pgmpy.estimators import MaximumLikelihoodEstimator, BayesianEstimator ``` ```python model = BayesianModel([('A', 'E'), ('S', 'E'), ('E', 'O'), ('E', 'R'),('R', 'T'), ('O', 'T')]) model.fit(train, estimator = MaximumLikelihoodEstimator) print(model.get_cpds('A')) ``` ``` ## ╒══════════╤══════════╕ ## │ A(adult) │ 0.479167 │ ## ├──────────┼──────────┤ ## │ A(old) │ 0.208333 │ ## ├──────────┼──────────┤ ## │ A(young) │ 0.3125 │ ## ╘══════════╧══════════╛ ``` --- --- ```python print(model.get_cpds('T')) ``` ``` ## ╒══════════╤═════════════════════╤═════════════════════╤═════════╤══════════╕ ## │ O │ O(emp) │ O(emp) │ O(self) │ O(self) │ ## ├──────────┼─────────────────────┼─────────────────────┼─────────┼──────────┤ ## │ R │ R(big) │ R(small) │ R(big) │ R(small) │ ## ├──────────┼─────────────────────┼─────────────────────┼─────────┼──────────┤ ## │ T(car) │ 0.5664556962025317 │ 0.5673076923076923 │ 0.6 │ 0.5 │ ## ├──────────┼─────────────────────┼─────────────────────┼─────────┼──────────┤ ## │ T(other) │ 0.21518987341772153 │ 0.0673076923076923 │ 0.2 │ 0.5 │ ## ├──────────┼─────────────────────┼─────────────────────┼─────────┼──────────┤ ## │ T(train) │ 0.21835443037974683 │ 0.36538461538461536 │ 0.2 │ 0.0 │ ## ╘══════════╧═════════════════════╧═════════════════════╧═════════╧══════════╛ ``` ```python ## BN predictions for the test set test_response = test['T'] pred = model.predict(test.drop(labels = 'T', axis=1)) df = pd.DataFrame({'pred' : pred['T'], 'test' : test_response} ) '{:.1%}'.format(df[df.pred == df.test].shape[0]/test.shape[0]) ``` ``` ## '66.2%' ``` --- ## BN details - Way faster algorithm - The simplest version just uses the CPDs - It can accommodate millions of variables and parameters ## Why do we need PGMs? - Every probabilistic model can be represented as a graph. That includes: - Linear and generalized linear models - Probabilistic clustering - Probabilistic PCA - Topic Modelling - Hierarchical Models - Latent Variable Models in general - Markov Chains - Bayesian Neural Networks - `\(\infty\)` --- class: middle ## Some real-life examples - **Medical diagnosis**: which characteristics of a patient influence in the presence of disease? - **Spam filter**: how likely is an email to be spam given its content? - **Google Page Rank**: what are the more relevant pages in a Google search? - **Recommendation systems (Netflix, Amazon, Facebook)**: which products the clients are more likely to consume? - **Presidential polls**: in who are the voters more likely to vote for? --- # Resources > https://github.com/search?p=2&q=probabilistic+models&type=Repositories .pull-left[ <img src="img/github.png" width="70%" style="display: block; margin: auto;" /> ] .pull-right[ - `bnlearn`: http://www.bnlearn.com/book-crc/ - data & tutorials - CRAN task `GR`: gRaphical Models in R - Representation & Modelling - https://cran.r-project.org/web/views/gR.html - `pgmpy` page: http://pgmpy.org/ - many, **many** more packages for both `R` and `python` ] --- class: center, middle ## Acknowledgments This work was supported by a Science Foundation Ireland Career Development Award grant number: 17/CDA/4695 <img src="img/SFI_logo.jpg" width="50%" height="40%" style="display: block; margin: auto;" /> --- # References <p><cite>Koller, D. and N. Friedman (2009). <em>Probabilistic graphical models: principles and techniques</em>. MIT press.</cite></p> <p><cite>Murphy, K. P. (2012). <em>Machine learning: a probabilistic perspective</em>. MIT press.</cite></p> <p><cite>Scutari, M. (2009). “Learning Bayesian networks with the bnlearn R package”. In: <em>arXiv preprint arXiv:0908.3817</em>.</cite></p> --- class: bottom, center, inverse <font size="30">Thanks! </font> <img src= "https://s3.amazonaws.com/kleebtronics-media/img/icons/github-white.png", width="50", height="50", align="middle"> <b> <color="FFFFFF"> https://github.com/brunaw </color>