genius tutorial

Introducing genius

You want to start analysing song lyrics, where do you go? There have been music information retrieval papers written on the topic of programmatically extracting lyrics from the web. Dozens of people have gone through the laborious task of scraping song lyrics from websites. Even a recent winner of the Shiny competition scraped lyrics from Genius.com.

I too have been there. Scraping websites is not always the best use of your time. genius is an R package that will enable you to programatically download song lyrics in a tidy format ready for analysis. To begin using the package, it first must be installed, and loaded. In addition to genius, we will need our standard data manipulation tools from the tidyverse.

install.packages("genius")
library(genius)
library(tidyverse)

Single song lyrics

The simplest method of extracting song lyrics is to get just a single song at a time. This is done with the genius_lyrics() function. It takes two main arguments: artist and song. These are the quoted name of the artist and song. Additionally there is a third argument info which determines what extra metadata you can get. The possible values are title, simple, artist, features, and all. I recommend trying them all to see how they work.

In this example we will work to retrieve the song lyrics for the upcoming musician Renny Conti.

floating <- genius_lyrics("renny conti", "people floating")
floating
## # A tibble: 22 x 3
##    track_title      line lyric                                            
##    <chr>           <int> <chr>                                            
##  1 People Floating     1 He don't know what to write                      
##  2 People Floating     2 She's a dream, she's staying overnight           
##  3 People Floating     3 And they're stoned, getting high                 
##  4 People Floating     4 But the view ain't nothing in her eyes           
##  5 People Floating     5 In a cut, small town                             
##  6 People Floating     6 Lift his eyes to see her running round           
##  7 People Floating     7 And it's sharp, yeah, her shape                  
##  8 People Floating     8 Makes the young man quiver in a constant state of
##  9 People Floating     9 Don't you know the mountain is that way?         
## 10 People Floating    10 With all the people floating miles and miles away
## # … with 12 more rows

Album Lyrics

Now that you have the intuition for obtaining lyrics for a single song, we can now create a larger dataset for the lyrics of an entire album using genius_album(). Similar to genius_lyrics(), the arguments are artist, album, and info.

In the exercise below the lyrics for Snail Mail’s album Lush. Try retrieving the lyrics for an album of your own choosing.

lush <- genius_album("Snail Mail", "Lush")
## Joining, by = c("track_title", "track_n", "track_url")
lush
## # A tibble: 265 x 4
##    track_title track_n  line lyric              
##    <chr>         <int> <int> <chr>              
##  1 Intro             1     1 Go                 
##  2 Intro             1     2 Get it all         
##  3 Intro             1     3 Let 'em watch      
##  4 Intro             1     4 Let it fall        
##  5 Intro             1     5 Nameless           
##  6 Intro             1     6 Sweat it out       
##  7 Intro             1     7 They don't love you
##  8 Intro             1     8 Do they?           
##  9 Intro             1     9 Grace              
## 10 Intro             1    10 Born and raised    
## # … with 255 more rows

Adding Lyrics to a data frame

Multiple songs

A common use for lyric analysis is to compare the lyrics of one artist to another. In order to do that, you could potentially retrieve the lyrics for multiple songs and albums and then join them together. This has one major issue in my mind, it makes you create multiple object taking up precious memory. For this reason, the function add_genius() was developed. This enables you to create a tibble with a column for an artists name and their album or song title. add_genius() will then go through the entire tibble and add song lyrics for the tracks and albums that are available.

Let’s try this with a tibble of three songs.

three_songs <- tribble(
  ~ artist, ~ title,
  "Big Thief", "UFOF",
  "Andrew Bird", "Imitosis",
  "Sylvan Esso", "Slack Jaw"
)

song_lyrics <- three_songs %>% 
  add_genius(artist, title, type = "lyrics")
## Joining, by = c("artist", "title")
song_lyrics %>% 
  count(artist)
## # A tibble: 3 x 2
##   artist          n
##   <chr>       <int>
## 1 Andrew Bird    39
## 2 Big Thief      48
## 3 Sylvan Esso    35

Multiple albums

add_genius() also extends this functionality to albums.

albums <- tribble(
  ~ artist, ~ title,
  "Andrew Bird", "Armchair Apocrypha",
  "Andrew Bird", "Things are really great here sort of"
)

album_lyrics <- albums %>% 
  add_genius(artist, title, type = "album")
## Joining, by = c("track_title", "track_n", "track_url")
## Joining, by = c("track_title", "track_n", "track_url")
## Joining, by = c("artist", "title")
album_lyrics
## # A tibble: 522 x 6
##    artist    title         track_title track_n  line lyric                 
##    <chr>     <chr>         <chr>         <int> <int> <chr>                 
##  1 Andrew B… Armchair Apo… Fiery Crash       1     1 Turnstiles and mezzan…
##  2 Andrew B… Armchair Apo… Fiery Crash       1     2 Jetways and Dramamine…
##  3 Andrew B… Armchair Apo… Fiery Crash       1     3 And x-ray machines    
##  4 Andrew B… Armchair Apo… Fiery Crash       1     4 You were hurling to s…
##  5 Andrew B… Armchair Apo… Fiery Crash       1     5 G-forces twisting you…
##  6 Andrew B… Armchair Apo… Fiery Crash       1     6 Breeding superstition 
##  7 Andrew B… Armchair Apo… Fiery Crash       1     7 A fatal premonition   
##  8 Andrew B… Armchair Apo… Fiery Crash       1     8 You know you got to e…
##  9 Andrew B… Armchair Apo… Fiery Crash       1     9 The fiery crash       
## 10 Andrew B… Armchair Apo… Fiery Crash       1    10 Oh, close your eyes a…
## # … with 512 more rows

What is important to note here is that the warnings for this function are somewhat informative. When a 404 error occurs, this may be because that the song does not exist in Genius. Or, that the song is actually an instrumental which is the case here with Andrew Bird.

Albums and Songs

In the scenario that you want to mix single songs and lyrics, you can supply a column with the type value of each row. The example below illustrates this. First a tibble with artist, track or album title, and type columns are created. Next, the tibble is piped to add_genius() with the unquote column names for the artist, title, and type columns. This will then iterate over each row and fetch the appropriate song lyrics.

song_album <- tribble(
  ~ artist, ~ title, ~ type,
  "Big Thief", "UFOF", "lyrics",
  "Andrew Bird", "Imitosis", "lyrics",
  "Sylvan Esso", "Slack Jaw", "lyrics",
  "Movements", "Feel Something", "album"
)

mixed_lyrics <- song_album %>% 
  add_genius(artist, title, type)
## Joining, by = c("track_title", "track_n", "track_url")
## Joining, by = c("artist", "title", "type")

Self-similarity

Another feature of genius is the ability to create self-similarity matrices to visualize lyrical patterns within a song. This idea was taken from Colin Morris’ wonderful javascript based Song Sim project. Colin explains the interpretation of a self-similarity matrix in their TEDx talk. An even better description of the interpretation is available in this post.

To use Colin’s example we will look at the structure of Ke$ha’s Tik Tok.

The function calc_self_sim() will create a self-similarity matrix of a given song. The main arguments for this function are the tibble (df), and the column containing the lyrics (lyric_col). Ideally this is one line per observation as is default from the output of genius_*(). The tidy output compares every ith word with every word in the song. This measures repetition of words and will show us the structure of the lyrics.

tik_tok <- genius_lyrics("Ke$ha", "Tik Tok")

tt_self_sim <- calc_self_sim(tik_tok, lyric, output = "tidy")

tt_self_sim
## # A tibble: 226,576 x 5
##     x_id  y_id identical word_x  word_y
##    <int> <int> <lgl>     <chr>   <chr> 
##  1     1     1 TRUE      wake    wake  
##  2     2     1 FALSE     up      wake  
##  3     3     1 FALSE     in      wake  
##  4     4     1 FALSE     the     wake  
##  5     5     1 FALSE     morning wake  
##  6     6     1 FALSE     feelin  wake  
##  7     7     1 FALSE     like    wake  
##  8     8     1 FALSE     p       wake  
##  9     9     1 FALSE     diddy   wake  
## 10    10     1 FALSE     hey     wake  
## # … with 226,566 more rows
tt_self_sim %>% 
  ggplot(aes(x = x_id, y = y_id, fill = identical)) +
  geom_tile() +
  scale_fill_manual(values = c("white", "black")) +
  theme_minimal() +
  theme(legend.position = "none",
        axis.text = element_blank()) +
  scale_y_continuous(trans = "reverse") +
  labs(title = "Tik Tok", subtitle = "Self-similarity matrix", x = "", y = "", 
       caption = "The matrix displays that there are three choruses with a bridge between the last two. The bridge displays internal repetition.")

Avatar
Josiah Parry
Social Data Scientist

Related