SLICED! a brief reflection


Josiah Parry


June 18, 2021

A few weeks ago I was a contestant on the machine learning game show #SLICED! The format of the challenge is as follows:

My stream is uploaded to youtube so you can catch it in all of its glory.

How I got roped in

About a month ago I saw the below tweet from Jesse Mostipak. Naturally, it piqued my interest.

Everyone’s favorite Tidy-Tuesday-Tom essentially voluntold me. I decided to put my name in the hat and see if I can compete. The challenge, though, is that I have only ever dabbled in machine learning. In May it was something that I had only done a handful of times and with a much older toolset-e.g. caret. If there is one thing I know about myself, it’s that there is nothing like a deadline and a concrete objective to get me to learn something.

I am a strong believer in Parkinson’s Law—you can thank my father for that—which is characterized by the saying “If you wait until the last minute, it only takes a minute to do.” Or, more formally, “work expands so as to fill the time available for its completion.”

In essence, the best way for me to get better at machine learning would be to put myself in a situation—as uncomfortable it may be—where I would have to do machine learning. Alternatively, I could just faily miserably but I don’t like that.

Getting a grip on {tidymodels}

I have been loosely following the tidymodels ecosystem since the beginning. Previously my understanding of tidymodels only included, recipes, rsample, and parsnip. These three packages can get you exceptionally far, but there are so many additional packages that are instrumental to improving the ML workflow for useRs. These are tune, workflows, and workflowsets.

The most challenging part of getting started with tidymodels was understanding where each package fits in during the process. The challenging task was to figure out which packages were low level APIs and which were abstractions.

Understanding tidymodels libraries

The most basic component of a tidymodel ML process is a recipe ({recipes}) and a model specification ({parsnip}). The recipe determines the features used and any preprocessing steps. The model specification determines which model will be trained. Additionally, we often want to include resampling—e.g. bootstrap or cross validation (called a resamples ({rsample}) object in tidymodels). With these three components we can then utilize the {tune} package to train our model on our resamples. We can build a layer of abstraction from these four components which is called a workflow.

In the ML process we want to train many models. And rather than just repeating the steps manually for each model, the package workflowsets will create many workflows for you and help you train all of those models quickly. Workflowsets were essential in my approach to sliced.

Tidymodels resources

Getting up to speed with tidymodels was a bit of a challenge. The packages are still actively under development so building corresponding documentation must be a bit of a challenge for the team! With that said, there are so many resources that you can use to get started. Below are some of the ones that I utilized and found helpful.