Design Paradigms in R

Lately I have been developing a deep curiosity of the origins of the R language. I have since read a more from the WayBack Machine than a Master’s student probably should. There are four documents that I believe to be extremely foundational and most clearly outline the original philosophies underpinning both R and its predecessor S. These are Evolution of the S Language (Chambers, 1996), A Brief History of S (Becker), Stages in the Evolution of S (Chambers, 200), and R: Past and Future History by Ross Ihaka (1998). The readings have elicited many lines of thought and potential inquiry. What interests me the most at present is the question “how have the design principles of S and R manifested themselves today?”

There are a number of design principles that I believe still exist in the R language today. Two of these find their origin in the development of S and the third was most clearly emphasized in the development of R. These are, in no particular order, what I believe to be the design principles that are most prominent in the modern iteration of R. The software should:

The first design philosophy I think is quite apparent in R’s continual development as an interface language. I believe part of R’s success is because of the immense community development focusing on interfaces to other tools and languages. Most prominently I would argue is the development of interfaces to SQL, Stan, JavaScript, and Python. Each enables R users to interact with existing infrastructure which vastly increases the breadth of technologies and tools available to useRs.

Identifying R’s evolution as an interface language is a rather objective task. Assessing these latter two principles is a subjective task, but one I will endeavor nonetheless.

The second principle is stated clearly by Ihaka (1998).

“My own conclusion has been that it is important to pursue efficiency issues, and in particular, speed.”

In my view of what is the prominent landscape of R today, few packages have paid as much attention to this principle as data.table. data.table’s performance is beyond reproach. The speed at which one can subset, manipulate, order, etc. using data.table is remarkable. Recent speed tests by package author Matt Dowle illustrate this. In performance testing, data.table regularly outperforms other existing tools such as Spark, dplyr, and pandas—spark being an exceptionally notable one.

This brings us to the third point which actually finds its origination in the development of S. John Chambers in Stages in the Evolution of S wrote that

“The ambiguity is real and goes to a key objective: we wanted users to be able to begin in an interactive environment, where they did not consciously think of themselves as programming.”

This is a helpful reminder that S was developed with the intentions of creating a tool for statistical analysis. In the design and implementation of S, the emphasis was doing analysis not programming. From this paradigm is where I think the tidyverse came—perhaps unintentionally, but consequently nonetheless.

The development of the tidyverse has, I believe, adhered to this principle. There is what seems to be a constant and conscious consideration to the useR experience. dplyr, for example, has developed a way to perform an analysis that is clear in intent and, to an extent, that can be read in a linguistically cogent manner.

From this view, it clear that the R community has done a great job adhering to these principles. The various odbc packages, numerous javascript packages (particularly in the Shiny space), reticulate, rstan, JuliaCall, and rcpp, among many others are an immense testament to this. Moreover, data.table’s “relentless focus on performance across the entire package” is reason for its success (Wickham, 2019). Similarly, I believe the relentless focus on user experience in the tidyverse is reason for its success. When viewing these two latter toolkits, they should be viewed at two sides to the same coin with each approaching the same end goal from a different perspective.