Introducing geniusR

Introducing geniusR

geniusR enables quick and easy download of song lyrics. The intent behind the package is to be able to perform text based analyses on songs in a tidy[text] format.

This package was inspired by the release of Kendrick Lamar’s most recent album, DAMN.. As most programmers do, I spent way too long to simplify a task, that being accessing song lyrics. Genius (formerly Rap Genius) is the most widly accessible platform for lyrics.

The functions in this package enable easy access of individual song lyrics, album tracklists, and lyrics to whole albums.

Install and load the package

devtools::install_github("josiahparry/geniusR")

Load the package:

library(geniusR)
library(tidyverse) # For manipulation

Getting Lyrics

Whole Albums

genius_album() allows you to download the lyrics for an entire album in a tidy format. There are two arguments artists and album. Supply the quoted name of artist and the album (if it gives you issues check that you have the album name and artists as specified on Genius).

This returns a tidy data frame with three columns:

  • title: track name
  • track_n: track number
  • text: lyrics
emotions_math <- genius_album(artist = "Margaret Glaspy", album = "Emotions and Math")
emotions_math
## # A tibble: 371 x 3
##    title             track_n text                                  
##    <chr>               <int> <chr>                                 
##  1 Emotions and Math       1 Oh when I got you by my side          
##  2 Emotions and Math       1 Everything's alright                  
##  3 Emotions and Math       1 Its just when your gone               
##  4 Emotions and Math       1 I start to snooze the alarm           
##  5 Emotions and Math       1 Cause I stay up until 4 in the morning
##  6 Emotions and Math       1 Counting all the days 'til you're back
##  7 Emotions and Math       1 Shivering in an ice cold bath         
##  8 Emotions and Math       1 Of emotions and math                  
##  9 Emotions and Math       1 Oh it's a shame                       
## 10 Emotions and Math       1 And I'm to blame                      
## # ... with 361 more rows

Multiple Albums

If you wish to download multiple albums from multiple artists, try and keep it tidy and avoid binding rows if you can. We can achieve this in a tidy workflow by creating a tibble with two columns: artist and album where each row is an artist and their album. We can then iterate over those columns with purrr:map2().

In this example I will extract 3 albums from Kendrick Lamar and Sara Bareilles (two of my favotire musicians). The first step is to create the tibble with artists and album titles.

albums <-  tibble(
  artist = c(
    rep("Kendrick Lamar", 3), 
    rep("Sara Bareilles", 3)
    ),
  album = c(
    "Section 80", "Good Kid, M.A.A.D City", "DAMN.",
    "The Blessed Unrest", "Kaleidoscope Heart", "Little Voice"
    )
)

albums
## # A tibble: 6 x 2
##   artist         album                 
##   <chr>          <chr>                 
## 1 Kendrick Lamar Section 80            
## 2 Kendrick Lamar Good Kid, M.A.A.D City
## 3 Kendrick Lamar DAMN.                 
## 4 Sara Bareilles The Blessed Unrest    
## 5 Sara Bareilles Kaleidoscope Heart    
## 6 Sara Bareilles Little Voice

No we can iterate over each row using the map2 function. This allows us to feed each value from the artist and album columns to the genius_album() function. Utilizing a map call within a dplyr::mutate() function creates a list column where each value is a tibble with the data frame from genius_album(). We will later unnest this.

## We will have an additional artist column that will have to be dropped
album_lyrics <- albums %>% 
  mutate(tracks = map2(artist, album, genius_album))

album_lyrics
## # A tibble: 6 x 3
##   artist         album                  tracks              
##   <chr>          <chr>                  <list>              
## 1 Kendrick Lamar Section 80             <tibble [1,184 × 3]>
## 2 Kendrick Lamar Good Kid, M.A.A.D City <tibble [2,192 × 3]>
## 3 Kendrick Lamar DAMN.                  <tibble [1,077 × 3]>
## 4 Sara Bareilles The Blessed Unrest     <tibble [666 × 3]>  
## 5 Sara Bareilles Kaleidoscope Heart     <tibble [582 × 3]>  
## 6 Sara Bareilles Little Voice           <tibble [577 × 3]>

Now when you view this you will see that each value within the tracks column is <tibble>. This means that that value is infact another tibble. We expand this using tidyr::unnest().

# Unnest the lyrics to expand 
lyrics <- album_lyrics %>% 
  unnest(tracks) %>%    # Expanding the lyrics 
  arrange(desc(artist)) # Arranging by artist name 

head(lyrics)
## # A tibble: 6 x 5
##   artist         album              title track_n text                    
##   <chr>          <chr>              <chr>   <int> <chr>                   
## 1 Sara Bareilles The Blessed Unrest Brave       1 You can be amazing      
## 2 Sara Bareilles The Blessed Unrest Brave       1 You can turn a phrase i…
## 3 Sara Bareilles The Blessed Unrest Brave       1 You can be the outcast  
## 4 Sara Bareilles The Blessed Unrest Brave       1 Or be the backlash of s…
## 5 Sara Bareilles The Blessed Unrest Brave       1 Or you can start speaki…
## 6 Sara Bareilles The Blessed Unrest Brave       1 Nothing's gonna hurt yo…

Song Lyrics

genius_lyrics()

Getting lyrics to a single song is pretty easy. Let’s get in our ELEMENT. and checkout DNA. by Kendrick Lamar. But first, note that the genius_lyrics() function takes two main arguments, artist and song. Be sure to spell the name of the artist and the song correctly.

DNA <- genius_lyrics(artist = "Kendrick Lamar", song = "DNA.")

DNA
## # A tibble: 95 x 3
##    title text                                                         line
##    <chr> <chr>                                                       <int>
##  1 DNA.  I got, I got, I got, I got—                                     1
##  2 DNA.  Loyalty, got royalty inside my DNA                              2
##  3 DNA.  Cocaine quarter piece, got war and peace inside my DNA          3
##  4 DNA.  I got power, poison, pain and joy inside my DNA                 4
##  5 DNA.  I got hustle though, ambition flow inside my DNA                5
##  6 DNA.  I was born like this, since one like this, immaculate conc…     6
##  7 DNA.  I transform like this, perform like this, was Yeshua new w…     7
##  8 DNA.  I don't contemplate, I meditate, then off your fucking head     8
##  9 DNA.  This that put-the-kids-to-bed                                   9
## 10 DNA.  This that I got, I got, I got, I got—                          10
## # ... with 85 more rows

This returns a tibble with three columns title, text, and line. However, you can specifiy additional arguments to control the amount of information to be returned using the info argument.

  • info = "title" (default): Return the lyrics, line number, and song title.
  • info = "simple": Return just the lyrics and line number.
  • info = "artist": Return the lyrics, line number, and artist.
  • info = "all": Return lyrics, line number, song title, artist.

Tracklists

genius_tracklist(), given an artist and an album will return a barebones tibble with the track title, track number, and the url to the lyrics.

genius_tracklist(artist = "Basement", album = "Colourmeinkindness") 
## # A tibble: 10 x 3
##    title     track_n track_url                                   
##    <chr>       <int> <chr>                                       
##  1 Whole           1 https://genius.com/Basement-whole-lyrics    
##  2 Covet           2 https://genius.com/Basement-covet-lyrics    
##  3 Spoiled         3 https://genius.com/Basement-spoiled-lyrics  
##  4 Pine            4 https://genius.com/Basement-pine-lyrics     
##  5 Bad Apple       5 https://genius.com/Basement-bad-apple-lyrics
##  6 Breathe         6 https://genius.com/Basement-breathe-lyrics  
##  7 Control         7 https://genius.com/Basement-control-lyrics  
##  8 Black           8 https://genius.com/Basement-black-lyrics    
##  9 Comfort         9 https://genius.com/Basement-comfort-lyrics  
## 10 Wish           10 https://genius.com/Basement-wish-lyrics

Nitty Gritty

genius_lyrics() generates a url to Genius which is fed to genius_url(), the function that does the heavy lifting of actually fetching lyrics.

I have not figured out all of the patterns that are used for generating the Genius.com urls, so errors are bound to happen. If genius_lyrics() returns an error. Try utilizing genius_tracklist() and genius_url() together to get the song lyrics.

For example, say “(No One Knows Me) Like the Piano” by Sampha wasn’t working in a standard genius_lyrics() call.

piano <- genius_lyrics("Sampha", "(No One Knows Me) Like the Piano")

We could grab the tracklist for the album Process which the song is from. We could then isolate the url for (No One Knows Me) Like the Piano and feed that into `genius_url().

# Get the tracklist for 
process <- genius_tracklist("Sampha", "Process")

# Filter down to find the individual song
piano_info <- process %>% 
  filter(title == "(No One Knows Me) Like the Piano")

# Filter song using string detection
# process %>% 
#  filter(stringr::str_detect(title, coll("Like the piano", ignore_case = TRUE)))

piano_url <- piano_info$track_url

Now that we have the url, feed it into genius_url().

genius_url(piano_url, info = "simple")
## # A tibble: 13 x 1
##    text                                                                   
##    <chr>                                                                  
##  1 No one knows me like the piano in my mother's home                     
##  2 You would show me I had something some people call a soul              
##  3 And you dropped out the sky, oh you arrived when I was three years old 
##  4 No one knows me like the piano in my mother's home                     
##  5 You know I left, I flew the nest                                       
##  6 And you know I won't be long                                           
##  7 And in my chest you know me best                                       
##  8 And you know I'll be back home                                         
##  9 An angel by her side, all of the times I knew we couldn't cope         
## 10 They said that it's her time, no tears in sight, I kept the feelings c…
## 11 And you took hold of me and never, never, never let me go              
## 12 'Cause no one knows me like the piano in my mother's home              
## 13 In my mother's home

On the Internals

Generative functions

This package works almost entirely on pattern detection. The urls from Genius are (mostly) easily reproducible (shout out to Angela Li for pointing this out).

The two functions that generate urls are gen_song_url() and gen_album_url(). To see how the functions work, try feeding an artist and song title to gen_song_url() and an artist and album title to gen_album_url().

gen_song_url("Laura Marling", "Soothing")
## [1] "https://genius.com/Laura-Marling-Soothing-lyrics"
gen_album_url("Daniel Caesar", "Freudian")
## [1] "https://genius.com/albums/Daniel-Caesar/Freudian"

genius_lyrics() calls gen_song_url() and feeds the output to genius_url() which preforms the scraping.

Getting lyrics for albums is slightly more involved. It first calls genius_tracklist() which first calls gen_album_url() then using the handy package rvest scrapes the song titles, track numbers, and song lyric urls. Next, the song urls from the output are iterated over and fed to genius_url().

To make this more clear, take a look inside of genius_album()

genius_album <- function(artist = NULL, album = NULL, info = "simple") {

  # Obtain tracklist from genius_tracklist
  album <- genius_tracklist(artist, album) %>%

    # Iterate over the url to the song title
    mutate(lyrics = map(track_url, genius_url, info)) %>%

    # Unnest the tibble with lyrics
    unnest(lyrics) %>%
    
    # Deselect the track url
    select(-track_url)


  return(album)
}

Notes:

As this is my first “package” there will be many issues. Please submit an issue and I will do my best to attend to it.

There are already issues of which I am present (the lack of error handling). If you would like to take those on, please go ahead and make a pull request. Please contact me on Twitter.

Avatar
Josiah Parry
Social Data Scientist