{cpcinema} & associated journey


Josiah Parry


April 28, 2021

Yesterday I had the chance to discuss my R package, {cpcinema} with the Boston useR group. If you’re not familiar with {cpcinema}, please read my previous post. I intended for my talk to be focused on the package’s functionality, how / why I made it, and then briefly on why I didn’t share it widely, my feelings after my seeing the response to my tweet, and why contributing to the open source community in any manner is always appreciated. But as I was preparing my slides on Monday night I encountered some challenges. That became the subject of much of my talk.

The talk can be found here and the passcode is D=L?nHQ0.

After providing a brief overview of the functionality of {cpcinema} I touched upon how this idea formed in my head. My head is like an unorganized floating abyss with numerous ideas, concepts, and facts just floating. Occasionally I’ll make a connection between two or more ideas. This will happen over a long period of time. In the case of this package I had a number of thoughts regarding data visualization—probably because I had seen Alberto Cairo present at Northeastern towards the end of 2019.

Additionally, I had come to the realization that most data scientists want to make informative charts, but also aren’t going to be exceptionally adept at the design portion of this—nor should they be! Moving beyond the default colors is an important first step in making evocative visualizations.

Sometime in between, I discovered or was shown the instagram page @colorpalette.cinema which provides beautiful color palettes from stills of films. How sweet would it be to use those colors for your plots? Rather sweet.

In January at rstudio::conf(2020L) I saw Jesse Sadler present on making custom S3 vector classes with the {vctrs} package (slides). After seeing this I was toying with the idea of making my own custom S3 vector class.

Then in February it all clicked. How cool would it be to get a colorpalette.cinema picture, extract the colors, and then store them in a custom S3 class which printed the color?! So I did that with hours of struggling and a lot of following along with Jesse’s resources.

To start any project of any sort, always **start small*. My workflow consisted of:

  1. Downloading a sample image to work with.
  2. Scour StackOverflow, Google, and Rbloggers for resources on extracting colors.
  3. Extract the colors from an image manually.
  4. Extract colors from an image path.
  5. Make it into a function to replicate.
  6. Try and get an image from Instagram URL.

Well, apparently you can’t! I tried and I tried to find different ways to extracting these images. But it wasn’t possible. The Instagram API is meant for web developers and businesses, not data scientists trying to get small pieces of data. I left it alone and left it to myself.

After my tweet, I was inspired to find a solution to the Instagram image URL problem. And with a lot of coffee, wifi hotpsotting, the Chrome devtools, I found a workable solution.

Then Monday night I was making slides and…

Error in download.file(fp, tmp, mode = "wb", quiet = TRUE) : 
  cannot open URL 'NA'

The function pal_from_post() didn’t work. I did what I always do when a function doesn’t work. I printed out the function body to figure out what the function does.

#> function (post_url) 
#> {
#>     res <- httr::POST("https://igram.io/api/", body = list(url = post_url, 
#>         lang_code = "en", vers = "2"), encode = "form") %>% httr::content()
#>     imgs <- res %>% rvest::html_nodes(".py-2 img") %>% rvest::html_attr("src") %>% 
#>         unique()
#>     img_paths <- imgs[2:length(imgs)]
#>     purrr::map(img_paths, extract_cpc_pal)
#> }
#> <bytecode: 0x294dac290>
#> <environment: namespace:cpcinema>

Always start the beginning. I ran the original API query and I received the following

nonation <- "https://www.instagram.com/p/CNdEkV4HAXF"

res <- httr::POST("https://igram.io/api/",
                  body = list(url = nonation,
                              lang_code = "en",
                              vers = "2"),
                  encode  = "form") 
Response [https://igram.io/api/]
  Date: 2021-04-26 22:46
  Status: 403
  Content-Type: text/html; charset=UTF-8
  Size: 3.26 kB
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<title>Access denied | igram.io used Cloudflare to restrict access</title>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />

My next step here was to see if it was just me. Next I went to use RStudio Server Pro which has a different IP address. The same query had the different response

Response [https://igram.io/api/]
  Date: 2021-04-26 22:43
  Status: 503
  Content-Type: text/html; charset=UTF-8
  Size: 9.48 kB
<html lang="en-US">
  <meta charset="UTF-8" />
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  <meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
  <meta name="robots" content="noindex, nofollow" />
  <meta name="viewport" content="width=device-width,initial-scale=1" />
  <title>Just a moment...</title>
  <style type="text/css">

A different error code. I saw this coming.

I was web scraping a website that was unquestionably also web scraping. This is a very grey zone of the internet and is questionable at best. These sources are ephemeral at best.

“If it doesn’t work, try the same thing again until it does work” - Me

I went to the same website I was using. I opened opened up the developer tools and watched the requests come in! The request that the browser uses had a change in their url! From https://igram.io/api/ to https://igram.io/i/. Frankly, a super easy fix!

It’s now Wednesday and I still haven’t made that change. So, what’s next? (HELP WANTED)