Recently, I was looking at an R function and found a if statement that looks something like this:
|
|
I guess this is common way of doing it as it matches well with a way of looking
at the problem: “if I want to test the presence of a column “entry” in a data frame
(df
), I can check if there is more at least one name that match “entry”.
Another way, a shorter one, is to use the operator %in%
:
|
|
I personally would have written:
|
|
It is less intuitive: basically sum(names(df) == "entry")
will
return the number of columns names “entry” and as something that is not 0 is
considered as TRUE
, there is no need for > 0
.
I would have done so simply because I knew that sum()
is quite efficient but
I had never do a comparison… until today 😸! To compare the three
options I wrote a small R script:
|
|
Note that system.time()
is quite convenient to benchmark small pieces of code.
Now, the results:
|
|
And the winner is… option 3 🏆! Interesting enough, dropping calls to functions consistently improves the efficiency but also, a smaller number of call does not mean a more efficient if statement… Not surprisingly, the efficiency of your conditional statement relies on the efficiency on the functions you call in your statement 👿!