args(cor)
function (x, y = NULL, use = "everything", method = c("pearson",
"kendall", "spearman"))
NULL
Hadley Wickham has recently dropped a new draft section of his book Tidy Design Principles on enumerations and their use in R.
In short, enumerations enumerate (list out) the possible values that something might take on. In R we see this most often in function signatures where an argument takes a scalar value but all possible values are listed out.
I will refer to enumerations as enums from here on.
A good example is the cor()
function from the base package stats
.
args(cor)
function (x, y = NULL, use = "everything", method = c("pearson",
"kendall", "spearman"))
NULL
The possible values for method
are "pearson"
, "kendall"
, or "spearman"
but all values are listed inside of the function definition.
Inside of the function, though, match.arg(method)
is used to ensure that the provided value to the method
argument is one of the provided values.
Hadley makes the argument that we should prefer an enumeration to a boolean flag such as TRUE
or FALSE
. I agree!
A post on mastodon makes a point that the function sf::st_make_grid()
has an argument square = TRUE
where when set to FALSE
hexagons are returned.
In this case, it’s very clear that an enum would be better! For example we can improve the signature like so:
When I first started using rust enums made no sense to me. In Rust, enums are a first class citizen that are treated as their own thing.
I’m not really sure what to call things in Rust. Are they all objects?
We make them by defining the name of the enum and the variants they may take on.
enum GridShape {
,
Square
Hexagon}
Now you can use this enum GridShape
to specify one of two types: Square
or Hexagon
. Syntactically, this is written GridShape::Square
and GridShape::Hexagon
.
Enums are very nice because we can match on the variants and do different things based on them. For example we can have a function like so:
fn which_shape(x: GridShape) {
match x {
GridShape::Square => println!("We have a square!"),
GridShape::Hexagon => println!("Hexagons are the bestagons")
}
}
It takes an argument x
which is a GridShape
enum. We match on the possible variants and then do something.
Inside of the match statement each of the possible variants of the enum have to be written out. These are called match arms. The left side lists the variant where as the right portion (after =>
) indicates what will be executed if the left side is matched (essentially if the condition is true).
With this function we can pass in specific variants and get different behavior.
which_shape(GridShape::Hexagon)
#> Hexagons are the bestagons
which_shape(GridShape::Square)
#> We have a square!
I think R would benefit from having a “real” enum type object. Having a character vector of valid variants and checking against them using match.arg()
or rlang::arg_match()
is great but I think we can go further.
Since learning Rust, I think having more strictness can make our code much better and more robust. I think adding enums would be a good step towards that
I’ve prototyped an Enum
type in R using the new S7
object system that might point us towards what an enum object in the future might look like for R users.
For an enum we need to know what the valid variants are and what the current value of the enum is. These would be the two properties.
An enum S7 object must also make sure that a value of an Enum is one of the valid variants. Using the GridShape
enum the valid variants would be "Square"
and "Hexagon"
. A GridShape enum could not take, for example, "Circle"
since it is not a listed variant.
To start, we will create an abstract S7 class called Enum
.
“_an abstract class is a generic class (or type of object) used as a basis for creating specific objects that conform to its protocol, or the set of operations it supports” — Source
The Enum
class will be used to create other Enum
objects.
library(S7)
# create a new Enum abstract class
Enum <- new_class(
"Enum",
properties = list(
Value = class_character,
Variants = class_character
),
validator = function(self) {
if (length(self@Value) != 1L) {
"enum value's are length 1"
} else if (!(self@Value %in% self@Variants)) {
"enum value must be one of possible variants"
}
},
abstract = TRUE
)
In this code chunk we specify that there are 2 properties: Value
and Variant
each must be a character type. Value
will be the value of the enum. It would be the right hand side of GridShape::Square
in Rust’s enum, for example. Variants
is a character vector of all of the possible values it may be able to take on. The validator ensures that Value
must only have 1 value. It also ensures that Value
is one of the enumerated Variants
. This Enum
class will be used to generate other enums and cannot be instantiated by itself.
We can create a new enum factory function with the arguments:
enum_class
the class of the enum we are creatingvariants
a character vector of the valid variant values# create a new enum constructor
new_enum_class <- function(enum_class, variants) {
new_class(
enum_class,
parent = Enum,
properties = list(
Value = class_character,
Variants = new_property(class_character, default = variants)
),
constructor = function(Value) {
new_object(S7_object(), Value = Value, Variants = variants)
}
)
}
Note that the constructor
here only takes a Value
argument. We do this so that users cannot circumvent the pre-defined variants.
With this we can now create a GridShape
enum in R!
GridShape <- new_enum_class(
"GridShape",
c("Square", "Hexagon")
)
GridShape
<GridShape> class
@ parent : <Enum>
@ constructor: function(Value) {...}
@ validator : <NULL>
@ properties :
$ Value : <character>
$ Variants: <character>
This new object will construct new GridShape
enums for us.
GridShape("Square")
<GridShape>
@ Value : chr "Square"
@ Variants: chr [1:2] "Square" "Hexagon"
When we try to create a GridShape that is not one of the valid variants we will get an error.
GridShape("Triangle")
Error: <GridShape> object is invalid:
- enum value must be one of possible variants
For fun, I would like Enum
objects to print like how I would use them in Rust. To do this we can create a custom print method
Since Enum
s will only ever be a sub-class we can confidently grab the first element of the class(enum_obj)
which is the super-class of the enum. We paste that together with the value of the enum.
square <- GridShape("Square")
square
GridShape::Square
Rust enums are even more powerful than what I briefly introduced. Each variant of an enum can actually be typed!!! Take a look at the example from The Book™.
enum Message {
,
Quit{ x: i32, y: i32 },
Move Write(String),
i32, i32, i32),
ChangeColor(}
In this enum there are 4 variants. The first Quit
doesn’t have any associated data with it. But the other three do! The second one Move
has two fields x
and y
which contain integer values. Write
is a tuple with a string in it and ChangeColor
has 3 integer values in its tuple. These can be extracted.
A silly example function that illustrates how each value can be used can be
fn which_msg(x: Message) {
match x {
Message::Quit => println!("I'm a quitter"),
Message::Move { x, y } => println!("Move over {x} and up {y}"),
Message::Write(msg) => println!("your message is: {msg}"),
Message::ChangeColor(r, g, b) => println!("Your RGB ({r}, {g}, {b})"),
}
}
When a variant with data is passed in the values can be used. For example
which_msg(Message::ChangeColor(0, 155, 200));
#> Your RGB (0, 155, 200)
What would this look like if we extended it to an R based enum object? I suspect the Variants
would be a list of prototypes such as those from {vctrs}
. The Value
would have to be validated against all of the provided prototypes to ensure that it is one of the provided types.
I’m not sure how I would code this up, but I think that would be a great thing to have.