Enums in R: towards type safe R

r
package-dev
rust
Author

Josiah Parry

Published

November 10, 2023

Hadley Wickham has recently dropped a new draft section of his book Tidy Design Principles on enumerations and their use in R.

In short, enumerations enumerate (list out) the possible values that something might take on. In R we see this most often in function signatures where an argument takes a scalar value but all possible values are listed out.

I will refer to enumerations as enums from here on.

Enums in R

A good example is the cor() function from the base package stats.

args(cor)
function (x, y = NULL, use = "everything", method = c("pearson", 
    "kendall", "spearman")) 
NULL

The possible values for method are "pearson", "kendall", or "spearman" but all values are listed inside of the function definition.

Inside of the function, though, match.arg(method) is used to ensure that the provided value to the method argument is one of the provided values.

Hadley makes the argument that we should prefer an enumeration to a boolean flag such as TRUE or FALSE. I agree!

A real world example

A post on mastodon makes a point that the function sf::st_make_grid() has an argument square = TRUE where when set to FALSE hexagons are returned.

In this case, it’s very clear that an enum would be better! For example we can improve the signature like so:

st_make_grid <- function(x, grid_shape = c("square", "hexagon"), ...) {
  # ensure only one of the provided grid shapes are used
  match.arg(grid_shape)
  # ... rest of function 
}

Enums in Rust

When I first started using rust enums made no sense to me. In Rust, enums are a first class citizen that are treated as their own thing.

I’m not really sure what to call things in Rust. Are they all objects?

We make them by defining the name of the enum and the variants they may take on.

enum GridShape {
  Square,
  Hexagon
}

Now you can use this enum GridShape to specify one of two types: Square or Hexagon. Syntactically, this is written GridShape::Square and GridShape::Hexagon.

Enums are very nice because we can match on the variants and do different things based on them. For example we can have a function like so:

fn which_shape(x: GridShape) {
    match x {
        GridShape::Square => println!("We have a square!"),
        GridShape::Hexagon => println!("Hexagons are the bestagons")
    }
}

It takes an argument x which is a GridShape enum. We match on the possible variants and then do something.

Inside of the match statement each of the possible variants of the enum have to be written out. These are called match arms. The left side lists the variant where as the right portion (after =>) indicates what will be executed if the left side is matched (essentially if the condition is true).

With this function we can pass in specific variants and get different behavior.

which_shape(GridShape::Hexagon)
#> Hexagons are the bestagons
which_shape(GridShape::Square)
#> We have a square!

Making an S7 enum object in R

I think R would benefit from having a “real” enum type object. Having a character vector of valid variants and checking against them using match.arg() or rlang::arg_match() is great but I think we can go further.

Since learning Rust, I think having more strictness can make our code much better and more robust. I think adding enums would be a good step towards that

I’ve prototyped an Enum type in R using the new S7 object system that might point us towards what an enum object in the future might look like for R users.

Design of an Enum

For an enum we need to know what the valid variants are and what the current value of the enum is. These would be the two properties.

An enum S7 object must also make sure that a value of an Enum is one of the valid variants. Using the GridShape enum the valid variants would be "Square" and "Hexagon". A GridShape enum could not take, for example, "Circle" since it is not a listed variant.

Using an abstract class

To start, we will create an abstract S7 class called Enum.

“_an abstract class is a generic class (or type of object) used as a basis for creating specific objects that conform to its protocol, or the set of operations it supports” — Source

The Enum class will be used to create other Enum objects.

library(S7)

# create a new Enum abstract class
Enum <- new_class(
  "Enum",
  properties = list(
    Value = class_character,
    Variants = class_character
  ),
  validator = function(self) { 
    if (length(self@Value) != 1L) {
      "enum value's are length 1"
    } else if (!(self@Value %in% self@Variants)) {
      "enum value must be one of possible variants"
    }
  }, 
  abstract = TRUE
)

In this code chunk we specify that there are 2 properties: Value and Variant each must be a character type. Value will be the value of the enum. It would be the right hand side of GridShape::Square in Rust’s enum, for example. Variants is a character vector of all of the possible values it may be able to take on. The validator ensures that Value must only have 1 value. It also ensures that Value is one of the enumerated Variants. This Enum class will be used to generate other enums and cannot be instantiated by itself.

We can create a new enum factory function with the arguments:

  • enum_class the class of the enum we are creating
  • variants a character vector of the valid variant values
# create a new enum constructor 
new_enum_class <- function(enum_class, variants) {
  new_class(
    enum_class,
    parent = Enum,
    properties = list(
      Value = class_character,
      Variants = new_property(class_character, default = variants)
    ),
    constructor = function(Value) {
      new_object(S7_object(), Value = Value, Variants = variants)
    }
  )
}

Note that the constructor here only takes a Value argument. We do this so that users cannot circumvent the pre-defined variants.

With this we can now create a GridShape enum in R!

GridShape <- new_enum_class(
  "GridShape",
  c("Square", "Hexagon")
)

GridShape
<GridShape> class
@ parent     : <Enum>
@ constructor: function(Value) {...}
@ validator  : <NULL>
@ properties :
 $ Value   : <character>
 $ Variants: <character>

This new object will construct new GridShape enums for us.

GridShape("Square")
<GridShape>
 @ Value   : chr "Square"
 @ Variants: chr [1:2] "Square" "Hexagon"

When we try to create a GridShape that is not one of the valid variants we will get an error.

GridShape("Triangle")
Error: <GridShape> object is invalid:
- enum value must be one of possible variants

Making a print method

For fun, I would like Enum objects to print like how I would use them in Rust. To do this we can create a custom print method

# print method for enums
# since this is an abstract class we get the first class (super)
# to print
print.Enum <- function(x, ...) {
  cat(class(x)[1], "::", x@Value, sep = "")
  invisible(x)
}

Since Enums will only ever be a sub-class we can confidently grab the first element of the class(enum_obj) which is the super-class of the enum. We paste that together with the value of the enum.

square  <- GridShape("Square")
square
GridShape::Square

Drawing even more from Rust

Rust enums are even more powerful than what I briefly introduced. Each variant of an enum can actually be typed!!! Take a look at the example from The Book™.

enum Message {
    Quit,
    Move { x: i32, y: i32 },
    Write(String),
    ChangeColor(i32, i32, i32),
}

In this enum there are 4 variants. The first Quit doesn’t have any associated data with it. But the other three do! The second one Move has two fields x and y which contain integer values. Write is a tuple with a string in it and ChangeColor has 3 integer values in its tuple. These can be extracted.

A silly example function that illustrates how each value can be used can be

fn which_msg(x: Message) {
    match x {
        Message::Quit => println!("I'm a quitter"),
        Message::Move { x, y } =>  println!("Move over {x} and up {y}"),
        Message::Write(msg) => println!("your message is: {msg}"),
        Message::ChangeColor(r, g, b) =>  println!("Your RGB ({r}, {g}, {b})"),
    }
}

When a variant with data is passed in the values can be used. For example

which_msg(Message::ChangeColor(0, 155, 200));
#> Your RGB (0, 155, 200)

Extending it to R

What would this look like if we extended it to an R based enum object? I suspect the Variants would be a list of prototypes such as those from {vctrs}. The Value would have to be validated against all of the provided prototypes to ensure that it is one of the provided types.

I’m not sure how I would code this up, but I think that would be a great thing to have.