I have been frequently using apply()
for years now, quite happily. But yesterday, I was surprised by how apply()
works and in this note I’ll try to explaining why.
Let’s consider the following data frame.
|
|
And let’s assume I need to convert all columns, irrespective of their class, to
columns of character strings (this example is somewhat contrived, but
illustrates well the issue). As passing df0
to as.character()
is clearly not
the solution:
|
|
I would opt for apply()
:
|
|
So far, so good. I can obtain the desired data frame by calling as.data.frame()
:
|
|
All good. Note that stringAsFactors = FALSE
is still needed if you are using R<4.0.0 (otherwise you get factors, not strings). Now let’s consider the case with a data frame of one row:
|
|
no biggie?? Hummm 😐, let’s see
|
|
So, two different objects even though inputs are very similar… In fairness, this is documented (see ?apply
).
If each call to ‘FUN’ returns a vector of length ‘n’, then ‘apply’ returns an array of dimension ‘c(n, dim(X)[MARGIN])’ if ‘n > 1’. If ‘n’ equals ‘1’, ‘apply’ returns a vector if ‘MARGIN’ has length 1 and an array of dimension ‘dim(X)[MARGIN]’ otherwise. If ‘n’ is ‘0’, the result has length 0 but not necessarily the ‘correct’ dimension.
Ok, but remember that one should handle functions that return outputs of different class for input of the same class with extra care. Below is an example where this is problematic
|
|
So basically, res1
does not have the same structure as res0
(I would expect 2 columns and 1 row) whereas initial inputs were similar… Actually, as I was not aware of this, I introduced a bug in rcites (kindly reported by Jeewantha Bandara), so this specific behavior could be pretty nasty.
There are several ways to avoid this, one is to check the number of rows of the input data frame and deal with the one row case separately. The way I dealt with the bug mentioned above was to use lapply()
instead, i.e.
|
|
like so, columns are treated as list elements and as.data.frame.list()
works just fine on the list returned by the lapply()
call.
… it’s ok, I will keep working with you, I would do my best reading the doc carefully… but damn… sometimes you’re killing me.