class: title-slide, center, middle # Cluster-based Quotas for Fairness Improvements in Music Recommendation Systems ## Bruna Wundervald, ### National University of Ireland, Maynooth ### March 19th, 2021 <img src="images/player.png" width="300"> --- name: clouds class: center, middle background-image: url(images/sea.jpg) background-size: cover ## .big-text[Hello] ### Bruna Wundervald <img style="border-radius: 50%;" src="https://avatars.githubusercontent.com/u/18500161?s=460&u=34b7f4888b6fe48b3c208beb51c69c146ae050cf&v=4" width="150px"/> [GitHub: @brunaw](https://github.com/brunaw) [Twitter: @bwundervald](https://twitter.com/bwundervald) [Page: http://brunaw.com/](http://brunaw.com/) --- class: middle # Today: <img src="images/fair.png" width="450" style="float:right"> - Algorithm Fairness ??? Here's who I know you are... -- - Music recommendation fairness - Popularity -- - Tackling the popularity issue ??? -- - Results ??? -- ??? --- class: middle, inverse # Algorithm Fairness --- class: middle # Algorithm Fairness - An algorithm is said to be *fair* if its results are independent of certain **variables**: - Ethnicity, gender, disability, etc - Defining fairness might be very <b> context dependent </b> - and it can also be very difficult due to its subjectivity - Has recently become a popular/strong area of research: - More and more privacy & transparency issues - Facebook, Google, etc - Discrimination laws --- class: middle, inverse # Music recommendation fairness --- class: middle # Music recommendation fairness - In (Kowald, Schedl, and Lex, 2020), the authors demonstrate that many recommendation algorithms are biased towards popular artists - In practice, this leads to already popular artists being recommended too many times - Unfairness: - To the users, because they `get recommended artists they probably already know` - To the artists, because newer/less popular artists `get less recommended than they deserve` --- class: middle, inverse # Tackling the popularity issue --- class: middle ## Standard prediction method - After a recommendation system is fit, we select the `Top-N` best recommendations for each user and make that the predictions - Issue: - Leads to high popularity unfairness in some cases > New approach: change the way the predictions are made such that the popularity distribution is taken into account --- class: small, middle ## New prediction method 1. Define a set of users `\(\mathcal{U}\)`, their music consumption history `\(\mathcal{I}_{\mathcal{U}}\)`, the set of artists `\(A_{\mathcal{I}}\)`, and their popularity as `\(\boldsymbol{Pop}_{A}\)` 2. Assume the existence of a hidden variable `\(z_k, k \in \{1,\dots,K\}\)` in the artist popularity, assume it's distribution to be a Gaussian Mixture and estimate its parameters via the EM algorithm: `$$\mathcal{P}=(\boldsymbol{Pop}_{A_{i}} |\boldsymbol{\theta}) = \sum_{k = 1}^{K} \pi_k \mathcal{N}(\boldsymbol{Pop}_{A_{i}} | \boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k),$$` 3. Adapt the predictions in a way that it respects the proportions of each estimated cluster: - Predict artists from each cluster based on `\(K\)` "quotas" sampled from a `\(\text{Multinomial}(\textbf{m}, \boldsymbol{\theta} \approx (\pi_1,\dots, \pi_{K}))\)`, with m = number of artists to be recommended --- <img src="images/algorithm.png" width="55%" style="display: block; margin: auto;" /> --- class: middle ## Advantages - Is not algorithm dependent - Does not add any complex steps - Easily adapted to different fairness contexts - Stochastic, leading to less bias in the selected artists --- class: middle ## Data - A sample of the LFM-1b dataset (listening events from Last.fm), 3000 users separated into three mainstreaminess groups: - low (1000 users), medium (1000 users) and high (1000 users) - Mainstreaminess: overlap between the listening history of a user and the overall listening history of all Last.fm users - `How popular are the artists the user listens to` - Total: 1.755.361 user-artist interactions and 352.805 artists - 80%/20% train and test separation - Popularity: the fraction of users that interacted with each artist (between 0 and 1) --- class: middle ## Recommendation lgorithms - `User Item Average`: based on the average listening counts for each user - `User KNN`: a K-nearest neighbors based method - `User KNN-Average`: a K-nearest neighbors based method combined with the average listening counts for each user - `Non-Negative Matrix Factorization (NMF)`: factorizes the user-artist interaction matrix in order to approximate each user's preferences > Goal: to make their predictions more 'fair' in popularity terms --- class: middle ## Before (top-n recommendations) % `\(\Delta\)` GAP: how much the popularity of the predictions differs from the expected popularity of the artists in the user profiles of the test set <img src="images/gap_analysis_28.png" width="75%" style="display: block; margin: auto;" /> --- class: middle, inverse # Results --- class: middle ## Results: cluster distribution <img src="images/clusters.png" width="70%" style="display: block; margin: auto;" /> --- class: middle ## Results: % `\(\Delta\)` GAP <img src="images/gap_analysis_modif_22.png" width="85%" style="display: block; margin: auto;" /> --- ## Results: recommendation frequencies .panelset[ .panel[.panel-name[UserItemAvg: Before] <img src="images/sampl_28_UserItemAvg.png" width="60%" style="display: block; margin: auto;" /> ] .panel[.panel-name[UserItemAvg: After] <img src="images/sampl_modif_2_UserItemAvg.png" width="60%" style="display: block; margin: auto;" /> ] ] --- class: middle ## Results: recommendation frequencies .panelset[ .panel[.panel-name[UserKNN: Before] <img src="images/sampl_28_UserKNN.png" width="60%" style="display: block; margin: auto;" /> ] .panel[.panel-name[UserKNN: After] <img src="images/sampl_modif_2_UserKNN.png" width="60%" style="display: block; margin: auto;" /> ] ] --- class: middle ## Results: recommendation frequencies .panelset[ .panel[.panel-name[UserKNNAvg: Before] <img src="images/sampl_28_UserKNNAvg.png" width="60%" style="display: block; margin: auto;" /> ] .panel[.panel-name[UserKNNAvg: After] <img src="images/sampl_modif_2_UserKNNAvg.png" width="60%" style="display: block; margin: auto;" /> ] ] --- class: middle ## Results: recommendation frequencies .panelset[ .panel[.panel-name[NMF: Before] <img src="images/sampl_28_NMF.png" width="60%" style="display: block; margin: auto;" /> ] .panel[.panel-name[NMF: After] <img src="images/sampl_modif_2_NMF.png" width="60%" style="display: block; margin: auto;" /> ] ] --- class: middle ## Results: Mean Absolute Error (MAE) <div class="figure" style="text-align: center"> <img src="images/mae.png" alt="Average MAE results for the 4 music recommendation systems, with percentual differences in parentheses. For almost all cases, there is an accuracy improvement with the new algorithm" width="95%" /> <p class="caption">Average MAE results for the 4 music recommendation systems, with percentual differences in parentheses. For almost all cases, there is an accuracy improvement with the new algorithm</p> </div> --- class: middle # Paper > https://github.com/brunaw/recommendation_fairness <img src="images/paper.png" width="75%" style="display: block; margin: auto;" /> --- class: middle # References <p><cite><a id='bib-pap_unfairness'></a><a href="#cite-pap_unfairness">Kowald, D., M. Schedl, and E. Lex</a> (2020). “The Unfairness of Popularity Bias in Music Recommendation: A Reproducibility Study”. In: <em>European Conference on Information Retrieval</em>. Springer. , pp. 35–42.</cite></p> <p><cite>Wundervald, B. (2021). “Cluster-based quotas for fairness improvements in music recommendation systems”. In: <em>International Journal of Multimedia Information Retrieval</em>, pp. 1–8.</cite></p> --- background-image: url(images/sea.jpg) background-size: cover class: center, middle, inverse ## .big-text[Questions?] --- class: bottom, left, inverse <img style="border-radius: 50%;" src="https://avatars.githubusercontent.com/u/18500161?s=460&u=34b7f4888b6fe48b3c208beb51c69c146ae050cf&v=4" width="150px"/> ## Thank you! ### Find me at... [`GitHub: @brunaw`](https://github.com/brunaw) [`Twitter: @bwundervald`](https://twitter.com/bwundervald) [`Page: http://brunaw.com/`](http://brunaw.com/)