Read a CSV in a production API

{plumber} and multipart request #RinProd

plumber
r
prod
Author

Josiah Parry

Published

June 25, 2024

Deploying RESTful APIs is the way to put any language into production. R is not any different.

One challenge when making APIs is handling files.

Uploading files is done typically with a multipart request.

“[they] combine one or more sets of data into a single body…. You typically use these requests for file uploads and for transferring data of several types in a single request (for example, a file along with a JSON object).”

Handling multipart requests in R

You can process them using the {mime} package.

Named after “mime types” not Mr. Mime

{plumber} provides access to the body of a request using the req argument.

#* @post /upload
upload <- function(req, res) {
    # body
}

To access the structure of request use mime::parse_multipart(req).

Modifying the function like so will return json from the API

#* @post /upload
upload <- function(req, res) {
    mp <- mime::parse_multipart(req)
    mp
}

Save this as plumber.R

Run your API

In your terminal (from the same working directory as plumber.R) run R -e 'plumber::plumb("plumber.R")$run(port = 3000)'

This will give you a background API to call.

Making a multipart request

Use httr2 to create the multipart request.

  • Start the request with request()
  • Base the request object to req_body_multipart() to add data
  • Use key-value pairs to req_body_multipart(...) to add data
    • Note that values must be a string so create the json yourself
  • Send the request using req_perform()

Here we give it a unique ID and add a sample of data

library(httr2)

resp <- request("http://127.0.0.1:3000/upload") |>
  req_body_multipart(
    id = ulid::ulid(),
    sample = jsonify::to_json(sample(1:100, 10), unbox = TRUE)
  ) |>
  req_perform()
Registered S3 method overwritten by 'jsonify':
  method     from    
  print.json jsonlite
resp
<httr2_response>
POST http://127.0.0.1:3000/upload
Status: 200 OK
Content-Type: application/json
Body: In memory (81 bytes)

We extract the data using resp_body_string() and process it using

resp_body_string(resp) |>
  RcppSimdJson::fparse()
$id
[1] "01J1QF6TA2VN2Z5WSFJ8DMJJ5W"

$sample
[1] "[42,85,18,65,14,10,9,21,27,93]"

Adding files

We’ll create a tempory file containing the iris data.frame and send this to the API endpoint.

These two lines:

  1. Create a temporary csv file
  2. Write the data frame to the temporary file
Tip

This is a very handy trick that you might be able to adapt to many other circumstances. Temporary files are very useful.

tmp <- tempfile(fileext = ".csv")
readr::write_csv(head(iris), tmp)

Next we need to upload the file to our request. Do this using curl::form_file(). You need to provide a path to the file. In this case, it will be the temporary file.

resp <- request("http://127.0.0.1:3000/upload") |>
  req_body_multipart(
    file = curl::form_file(tmp)
  ) |>
  req_perform()

resp_body_string(resp) |>
  jsonify::pretty_json()
{
    "file": [
        {
            "name": "filef10c3faec0e9.csv",
            "size": 192,
            "type": "application/octet-stream",
            "datapath": "/var/folders/wd/xq999jjj3bx2w8cpg7lkfxlm0000gn/T//RtmphrlFYJ/filef0d54bf0c8ce"
        }
    ]
}

In this case file is a named list. mime stores the file in a temporary path accessible via datapath. So let’s try adding an API endpoint to read a csv file.

Read CSV in Plumber API

Here we read the csv from the path. We would probably need to add some better checks here. Like checking that the field actually exists in mp but the error will be propagates as a 500 status anyways.

Something is always better than nothing. Just like this blog post.

#* @post /read_csv
function(req, res) {
  mp <- mime::parse_multipart(req)
  readr::read_csv(mp$file$datapath)
}

Send CSV to API

Here is how we can send the csv to the API

resp <- request("http://127.0.0.1:3000/read_csv") |>
  req_body_multipart(
    file = curl::form_file(tmp)
  ) |>
  req_perform()

resp_body_string(resp) |>
  jsonify::pretty_json()
[
    {
        "Sepal.Length": 5.1,
        "Sepal.Width": 3.5,
        "Petal.Length": 1.4,
        "Petal.Width": 0.2,
        "Species": "setosa"
    },
    {
        "Sepal.Length": 4.9,
        "Sepal.Width": 3,
        "Petal.Length": 1.4,
        "Petal.Width": 0.2,
        "Species": "setosa"
    },
    {
        "Sepal.Length": 4.7,
        "Sepal.Width": 3.2,
        "Petal.Length": 1.3,
        "Petal.Width": 0.2,
        "Species": "setosa"
    },
    {
        "Sepal.Length": 4.6,
        "Sepal.Width": 3.1,
        "Petal.Length": 1.5,
        "Petal.Width": 0.2,
        "Species": "setosa"
    },
    {
        "Sepal.Length": 5,
        "Sepal.Width": 3.6,
        "Petal.Length": 1.4,
        "Petal.Width": 0.2,
        "Species": "setosa"
    },
    {
        "Sepal.Length": 5.4,
        "Sepal.Width": 3.9,
        "Petal.Length": 1.7,
        "Petal.Width": 0.4,
        "Species": "setosa"
    }
]

Note that the response is just nice json.

We can parse that back doing a full round trip:

resp_body_string(resp) |>
  RcppSimdJson::fparse()
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Whole API:

plumber.R
library(plumber)
# R -e 'plumber::plumb("plumber.R")$run(port = 3000)'
#* @post /upload
upload <- function(req, res) {
  mp <- mime::parse_multipart(req)
  mp
}

#* @post /read_csv
function(req, res) {
  mp <- mime::parse_multipart(req)
  readr::read_csv(mp$file$datapath)
}

Scale your APIs

Use Valve to scale and deploy your applications to production.

It kicks ass tbh.