Feature Engineering for Genre Characterization in Brazilian Music

layout: true
 
<div class="my-footer">https://github.com/brunaw/genre_characterization</div>

---
name: bookdown-title

.pull-left[
<div class="column">
<img src="img/MU_logo.png" width="350">
</div>
]

### .fancy[Feature Engineering for Genre Characterization in Brazilian Music]

.large[Bruna Wundervald | Maynooth University | Sep 18, 2020]

---
exclude: true
name: lifecycle

individual files:

.Rmd to .md (via knitr)

.md to HTML (via pandoc)

HTML to lots of HTML --> BOOK (via bookdown)

---
class: inverse, middle

### .fancy[Summary]

- Introduction
    - Research questions
  
  - Definitions
    - Data
    - Manually Extracted Features
    - Machine Learning Algorithm
  
  - Results 
  
  - Conclusions

---
class: inverse, middle, center

### .fancy[Introduction]

---
# Introduction

- Many factors are involved in the configuration of a music genre, such as style, historical context, and harmonic structures (Caldas (2010))

- Defining music genres is not a trivial task, and is
an important problem in various aspects of music studies

>  The focus of this work is towards **verifying the connection between harmonic information and genre specification in Brazilian music**, through the evaluation of feature importance in machine learning models

- Mid-level music features such as chords configure a rich resource of information regarding genres
(Cheng, Yang, Lin, Liao, and Chen (2008), 
Corrêa and Rodrigues (2016))

- We use symbolic chords data and manually extracted harmonic features for genre classification

- **The features represent the chords structures in different and meaningful ways**

---
class: inverse, middle, center

### .fancy[Definitions]

---

## Data

> Type: **Symbolic chords sequences for each song**

- The chords are extracted from the Cifraclub, 
an online collaborative page of music-sharing, through
the `chorrrds` (Wundervald (2018))
package for `R` (R Core Team (2018))

-  Crowd-sourced data is becoming more common in the literature  (e.g. Odekerken, Koops, and Volk (2020))

- In total, **8 music genres** were used: Reggae, Pop, Forró, Bossa Nova, Sertanejo, MPB, Rock, and Samba

- **106 different artists** were available on the online platform, for which the chords and keys for **8339 songs** were collected

---

## Manually Extracted Features

- Interpretable summary features from the chords, to make them
more informative

.pull-left[

**First set, triads and simple tetrads:**
- percentage of suspended chords (e.g. Gsus), 
- of chords with the seventh (e.g. C7), 
- of minor chords with the seventh (e.g. Em7, C#m7), 
- of minor (e.g. Em, C#m), 
- of diminished (e.g. Bdim)
- and of augmented (e.g. Baug) chords

**Second set, Tetrads:** 
- percentage of chords with the fourth (e.g. D4), 
- the sixth (e.g. E6), 
- the ninth (e.g. G9), 
- with the major seventh (e.g. F7+, Am7+), 
- with a diminished fifth (e.g. C5- or C5b) 
- and with an augmented fifth (e.g. C5+ or C5#)
]

.pull-right[

<div class="figure" style="text-align: center">
<img src="img/feat_example.png" alt="Feature extraction example" width="100%" height="100%" />
Feature extraction example
</div>
]

---

## Manually Extracted Features

**Third set, main chord transitions:**
- percentage of the first, second, and third most common chord transitions in the song

**Fourth set, miscellany:**
- popularity, 
- total of non-distinct chords, 
- year of album release, 
- indicator of the key of the song being the same as the most common chord, 
- percentage of chords with varying bass (e.g. C/E, C/G, C/Bb), 
- mean distance of the root note to ’C’ in the circle of fifths, 
- mean distance of the root note to ’C’ in semitones, 
- absolute number of the most common chord

> Supplementary features about the release year and popularity were obtained with the help of the well-known **Spotify API**

---

## Machine Learning Algorithm

> Popular Random Forest (Breiman (2001)) model

-  Characterized by being a tree ensemble
that only allows
a random subset `$m$` of the features to be the candidates for a split, helping to create uncorrelated trees

- The model equation can be written as 
`$$\hat f(\mathbf{x}) = \sum_{n = 1}^{N_{\text{tree}}} \frac{1}{N_{\text{tree}}} \hat f_n(\mathbf{x}),$$`

where `$\hat f_n$` corresponds to the `$n$`-th estimated tree, out
of a total of `$N_{\text{tree}}$` trees, and `$\mathbf{x}$` is the
feature set

> **Advantage**: We can easily access the importance (misclassification
reduction) for each feature used in the model

---
class: inverse, middle, center

### .fancy[Results]

---

# Results

- Four models were created in a nested fashion, with each new model being added with one of the features sets

- Table 1 shows that, for all different models, there is evidence of their accuracy being significantly higher the non-information classification rate
  - The addition of the feature sets progressively increases the accuracy of the models
  - This shows how the 4 set of features are informative in predicting the music genres

---
class: middle

- From Table 2, we can see that there is considerable
confusion between MPB and Bossa Nova, highlighting their 
known harmonic similarities
- The same happens to Forró, Sertanejo
and Pop, which are music genres with a similar origin and,
in general, more elementary harmonic structures

<table class="huxtable" style="border-collapse: collapse; border: 0px; margin-bottom: 2em; margin-top: 2em; ; margin-left: auto; margin-right: auto; " id="tab:unnamed-chunk-3">
<col><col><col><col><col><col><col><col><col><tr>
<th style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt; padding: 6pt 6pt 6pt 0pt; font-weight: bold;">Genre</th><th style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt; padding: 6pt 6pt 6pt 6pt; font-weight: bold;">Bossa Nova</th><th style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt; padding: 6pt 6pt 6pt 6pt; font-weight: bold;">Forró</th><th style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt; padding: 6pt 6pt 6pt 6pt; font-weight: bold;">MPB</th><th style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt; padding: 6pt 6pt 6pt 6pt; font-weight: bold;">Pop</th><th style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt; padding: 6pt 6pt 6pt 6pt; font-weight: bold;">Reggae</th><th style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt; padding: 6pt 6pt 6pt 6pt; font-weight: bold;">Rock</th><th style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt; padding: 6pt 6pt 6pt 6pt; font-weight: bold;">Samba</th><th style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt; padding: 6pt 0pt 6pt 6pt; font-weight: bold;">Sertanejo</th></tr>
<tr>
<td style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt; padding: 6pt 6pt 6pt 0pt; font-weight: bold;">Bossa Nova</td><td style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt; padding: 6pt 6pt 6pt 6pt; font-weight: bold;">28%</td><td style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">40%</td><td style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">5%</td><td style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">16%</td><td style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt; padding: 6pt 0pt 6pt 6pt; font-weight: normal;">12%</td></tr>
<tr>
<td style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 0pt; font-weight: bold;">Forró</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: bold;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">12%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">12%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">10%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 0pt 6pt 6pt; font-weight: normal;">65%</td></tr>
<tr>
<td style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 0pt; font-weight: bold;">MPB</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">1%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: bold;">59%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">11%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">13%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 0pt 6pt 6pt; font-weight: normal;">15%</td></tr>
<tr>
<td style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 0pt; font-weight: bold;">Pop</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">13%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: bold;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">28%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">15%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 0pt 6pt 6pt; font-weight: normal;">44%</td></tr>
<tr>
<td style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 0pt; font-weight: bold;">Reggae</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">25%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: bold;">8%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">46%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">8%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 0pt 6pt 6pt; font-weight: normal;">12%</td></tr>
<tr>
<td style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 0pt; font-weight: bold;">Rock</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">16%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: bold;">43%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">5%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 0pt 6pt 6pt; font-weight: normal;">35%</td></tr>
<tr>
<td style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 0pt; font-weight: bold;">Samba</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">1%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">20%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">3%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: bold;">66%</td><td style="vertical-align: top; text-align: center; white-space: normal; padding: 6pt 0pt 6pt 6pt; font-weight: normal;">10%</td></tr>
<tr>
<td style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt; padding: 6pt 6pt 6pt 0pt; font-weight: bold;">Sertanejo</td><td style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">2%</td><td style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0%</td><td style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">7%</td><td style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">2%</td><td style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt; padding: 6pt 0pt 6pt 6pt; font-weight: bold;">89%</td></tr>
</table>

---
class: middle

.pull-left[
<div class="figure" style="text-align: center">
<img src="img/imp_m3.png" alt="Figure 1. Importance plot for the fourth model with all the features. The top part of the plot is dominated by the harmonic features." width="100%" height="100%" />
Figure 1. Importance plot for the fourth model with all the features. The top part of the plot is dominated by the harmonic features.
</div>
]

.pull-right[

- Figure 1 shows that the first set of features is the most informative one
  - With the basic chords information we can already obtain good results in terms of informing the model about the genres
  
- The external features (year and popularity),
got a high rank in the plot, showing how the Spotify data is also pertinent

- The position of the transitions and distances features strengthens the idea of harmonic characteristics being very important to discriminate between music genres
]
---
class: inverse, middle, center

### .fancy[Conclusions]

---

# Conclusions

- Manually engineered harmonic features can be useful to characterize Brazilian music genres

- The **most discriminative** features are:
  - the percentage of chords with the seventh note, 
  - of minor chords with the seventh note, 
  - of minor chords,
  - the year of release of the songs, 
  - the popularity 
  - and the behavior of the most common chord transitions
  
- Our insights can be extended to other music genres that influenced or were influenced by the genres considered here, such as Jazz, Pop, and Rock music

- Next steps of this work include the engineering of the new variables and applying different machine learning algorithms, as well as exploring more the use of chords crowdsourced data

Links:
  - [To code and data](https://github.com/brunaw/genre_classification)
  - [To presentation repository](https://github.com/brunaw/genre_characterization)

---
 
# References

<cite><a id='bib-Breiman2001'></a><a href="#cite-Breiman2001">Breiman, L.</a>
(2001).
&ldquo;Random forests&rdquo;.
In: Machine Learning.
ISSN: 08856125.
DOI: <a href="https://doi.org/10.1023/A:1010933404324">10.1023/A:1010933404324</a>.
eprint: /dx.doi.org/10.1023%2FA%3A1010933404324.</cite>

<cite><a id='bib-Caldas2010'></a><a href="#cite-Caldas2010">Caldas, W.</a>
(2010).
Iniciação à Música Popular Brasileira.
Vol. 1.</cite>

<cite><a id='bib-Cheng2008'></a><a href="#cite-Cheng2008">Cheng, H., Y. Yang, Y. Lin, et al.</a>
(2008).
&ldquo;Automatic chord recognition for music classification and retrieval&rdquo;.
In: 
2008 IEEE International Conference on Multimedia and Expo.
IEEE.
, pp. 1505&ndash;1508.</cite>

<cite><a id='bib-Correa2016'></a><a href="#cite-Correa2016">Corrêa, D. C. and F. A. Rodrigues</a>
(2016).
A survey on symbolic data-based music genre classification.
DOI: <a href="https://doi.org/10.1016/j.eswa.2016.04.008 Short Survey">10.1016/j.eswa.2016.04.008 Short Survey</a>.</cite>

<cite><a id='bib-odekerken2020decibel'></a><a href="#cite-odekerken2020decibel">Odekerken, D, H. V. Koops, and A. Volk</a>
(2020).
&ldquo;DECIBEL: Improving Audio Chord Estimation for Popular Music by Alignment and Integration of Crowd-Sourced Symbolic Representations&rdquo;.
In: arXiv preprint arXiv:2002.09748.</cite>

<cite><a id='bib-Rsoftware'></a><a href="#cite-Rsoftware">R Core Team</a>
(2018).
R: A Language and Environment for Statistical Computing.
R Foundation for Statistical Computing.
Vienna, Austria.
URL: <a href="https://www.R-project.org/">https://www.R-project.org/</a>.</cite>

<cite><a id='bib-chorrrds'></a><a href="#cite-chorrrds">Wundervald, B.</a>
(2018).
The chorrrds package for extraction of music chords data in R.
URL: <a href="https://github.com/r-music/chorrrds">https://github.com/r-music/chorrrds</a>.</cite>

---

class: middle, center, inverse

Thanks!