Broadcasting: Scalars or vectors
There’s a common pattern that we encounter when writing functions for R. A single argument can often either be
- a scalar
- or a vector of the same length as another argument
When it’s a scalar, it makes sense to “broadcast” it to the same length of another argument. Since R is vectorized we often want our functions to be able to handle these scenarios.
What is broadcasting?
Broadcasting definitely isn’t a new idea. It was first exposed to me from Kyle Barron’s work in geoarrow-rs. It gave words to a pattern I have handled many times.
Broadcasting ensures that the “shape” of two arrays are the same. We are essentially stretching a scalar to the length of a longer array.
You can find broadcasting in many places:
- Julia has array broadcasting
- NumPy has broadcasting rules that solve this elegantly for array operations.
- The rray package
My use case
In my work on the R-ArcGIS Bridge we create many httr2 requests and send them in parallel.
For ergonomic reasons, arguments should accept either a scalar OR a vector of the same length. This is similar to R’s recycling
Here’s a function I’m working with:
<- function(con, xid, yid) {
make_requests # what if xid is a scalar?
# TODO we need to broadcast
<- length(xid)
n
# initialize empty list
<- vector("list", n)
all_reqs
for (i in seq_len(n)) {
# create an httr2 request and store it in the list
<- httr2::request(url) |>
req ::req_body_form(
httr2xid = xid[i],
yid = yid[i]
)<- req
all_reqs[[i]]
}# send all of the requests
<- httr2::req_perform_parallel(
all_resps
all_reqs,max_active = 3
)
# process the requests
all_resps }
The problem occurs when xid
is a scalar. This means that the loop length with be 1
when insteaad it should be the length of yid
. Additionally, if xid
is a scalar and i subset into it with xid[i]
and i > 1
then the value will be NA. We don’t want that!
If xid
was broadcasted to the length of yid
first then we can be sure that the lengths are the same.
Right now, implementing this flexibility means writing manual validation and broadcasting logic in every single function. That’s tedious and error-prone.
A solution
I think if there was a formalized broadcast()
function that could make this pattern more stable and reproducible without much overhead or boilerplate for devs.
#' Broadcast x to the same length as y
#'
#' Broadcasts the argument `x` to the same length as `y`.
#'
#' @param x a scalar atomic or an atomic of the same length as `y`
#' @param y an atomic vector
<- function(x, y) {
broadcast if (!rlang::is_bare_atomic(x) || !rlang::is_bare_atomic(y)) {
::abort("`x` and `y` must be atomic vectors")
rlang
}
if (typeof(x) != typeof(y)) {
::abort("`x` and `y` must be the same type")
rlang
}
<- length(y)
len_y <- length(x)
len_x
if (len_x == 1L) {
return(rep(x, len_y))
}
if (len_x != len_y) {
::abort("`x` must be a scalar or the same length as `y`")
rlang
}
x }
How it works
It takes the first argument and casts it to the length of y
. If x
is the same length as y
then it returns it unchanged.
Additionally, it ensures that the two types of vectors are the same classes.
# Scalar broadcasting
broadcast("xyz", letters)
#> [1] "xyz" "xyz" "xyz" "xyz" "xyz" "xyz" "xyz" "xyz" "xyz" "xyz" "xyz" "xyz"
#> [13] "xyz" "xyz" "xyz" "xyz" "xyz" "xyz" "xyz" "xyz" "xyz" "xyz" "xyz" "xyz"
#> [25] "xyz" "xyz"
# Same-length vectors pass through
broadcast(rep("abc", 26), letters)
#> [1] "abc" "abc" "abc" "abc" "abc" "abc" "abc" "abc" "abc" "abc" "abc" "abc"
#> [13] "abc" "abc" "abc" "abc" "abc" "abc" "abc" "abc" "abc" "abc" "abc" "abc"
#> [25] "abc" "abc"
# Incompatible lengths error
broadcast(rep("xyz", 2), letters)
#> Error in `broadcast()`:
#> ! `x` must be a scalar or the same length as `y`
Now the function becomes way cleaner:
<- function(con, xid, yid) {
make_requests # broadcast arguments to same length
<- broadcast(xid, yid)
xid <- length(yid) # now we can safely use yid length
n
# initialize empty list
<- vector("list", n)
all_reqs
for (i in seq_len(n)) {
# create an httr2 request and store it in the list
<- httr2::request(url) |>
req ::req_body_form(
httr2xid = xid[i],
yid = yid[i]
)<- req
all_reqs[[i]]
}# send all of the requests
<- httr2::req_perform_parallel(
all_resps
all_reqs,max_active = 3
)
# process the requests
all_resps }
For large vectors, rep()
creates a new vector in memory.
A better approach would be able to create an ALTREP vector here that just has a reference to the initial scalar value.
What’s next
A production version might need to handle more cases like factor level compatibility, date/datetime broadcasting, and NA handling. But the core pattern works.
I’ve proposed this for rlang in issue #1819. If you run into this pattern too, give it a thumbs up!
R excels at making complex operations simple and expressive. Broadcasting feels like a natural next step.