What’s so special about array
s?
I’m working on a new video about S3 objects in R and class inheritance / hierarchy. One of my favorite functions for exploring objecs and their structure is unclass()
.
The documentation states
unclass
returns (a copy of) its argument with its class attribute removed. (It is not allowed for objects which cannot be copied, namely environments and external pointers.)
see ?unclass
and review Details
So, unclass()
should remove the class of any and all objects. For the sake clarity of this post I’m going to make a helper function.
This works for factor
s which are just integer vectors which have the attribute levels
class_unclass(factor())
[1] "integer"
and data.frame
s are lists (also technically a vector just not atomic but instead recursive) with attributes row.names
, and names
.
class_unclass(data.frame())
[1] "list"
But when we get down to matrix
another poser type (just like data.frame and factor pretending to be something they’re not actually), we get something different.
class_unclass(matrix())
[1] "matrix" "array"
Well, why the heck is that? What about it makes it so special? Let’s explore this a bit more.
The two things that make a matrix are the classes c("matrix", "array")
and the dim
attribute which specifies the dimensions. Matrixes are two dimensional arrays, by the way!
attributes(matrix())
$dim
[1] 1 1
What is weird is that you can make a matrix just by adding the dim
attribute to a vector.
We didn’t even specify the class. Why does this happen?
And when we remove the dim attribute….
attr(m, "dim") <- NULL
class_unclass(m)
[1] "integer"
we get an integer vector. This differs from the behavior of other similar types. Recall that factors are integer
vectors with an attribute of levels
that is a character vector.
So let’s try something here. Let’s create a factor from scratch.
Now, if the behavior is similar to matrix
or array
we would expect that by omitting the class
attribute R should reconstruct it.
Nope! Would you look at that!
@yjunechoe pointed me to some excerpts from Deep R Programming, a book I wish I had read yesterday. It refers to these attributes as “special attributes”. The author, Marek Gagolewski (author of stringi
, by the way), makes note of this special behavior of matrix but leaves it at that.
To me, this is a fundamental inconsistency in the language. Either all poser types (data.frame, matrix, factor, etc) should be automatically created if their special attributes are set on the appropriate type or not at all. What justification is there for only matrix
having a special behavior in unclass()
?
To me, this warrants a bug report for unclass()
. Based on the documentation, unclass()
should always remove the class attribute from an object but it fails to do so for arrays
and matrix
es.
Looking deeper!
With some further exploratory help of June we can see the internal representation of these objects.
Let’s create an object with a dim
attribute and a custom class. Suprisingly, the custom class is respected and matrix
and array
aren’t slapped on it.
If we look at the internal representation using .Internal(inspect(x))
we get to see some of the C goodies.
.Internal(inspect(x))
@10ea6ccd0 13 INTSXP g0c1 [OBJ,REF(2),ATT] (len=1, tl=0) 0
ATTRIB:
@10f0d5570 02 LISTSXP g0c0 [REF(1)]
TAG: @13d00e880 01 SYMSXP g1c0 [MARK,REF(2905),LCK,gp=0x4000] "dim" (has value)
@10ea6cd08 13 INTSXP g0c1 [REF(65535)] (len=2, tl=0) 1,1
TAG: @13d00e9d0 01 SYMSXP g1c0 [MARK,REF(32428),LCK,gp=0x6000] "class" (has value)
@12d670170 16 STRSXP g0c1 [REF(65535)] (len=1, tl=0)
@12d6701a8 09 CHARSXP g0c1 [REF(2),gp=0x60] [ASCII] [cached] "meep"
Just look to the right hand side of these gibberish. See that the dim
has a value of 1,1
and class
has a value of meep
. There is no matrix or array or nothing.
Now we remove the meep
class and check again.
class_unclass(x)
[1] "matrix" "array"
Boom matrix and array. But if we look at the internals…
@10fa94200 13 INTSXP g0c1 [ATT] (len=1, tl=0) 0
ATTRIB:
@10f31f5d8 02 LISTSXP g0c0 [REF(1)]
TAG: @13d00e880 01 SYMSXP g1c0 [MARK,REF(2933),LCK,gp=0x4000] "dim" (has value)
@10ea6cd08 13 INTSXP g0c1 [REF(65535)] (len=2, tl=0) 1,1
THEY AREN’T THERE!!!!!!