On 17/02/2015 11:19 AM, John Posner wrote:
In the course of slicing-and-dicing some data, I had occasion to create a list 
like this:

list(
     subset(my_dataframe, GR1=="XX1"),
     subset(my_dataframe, GR1=="XX2"),
     subset(my_dataframe, GR1=="YY"),
     subset(my_dataframe, GR1 %in% c("XX1", "XX2")),
     subset(my_dataframe, GR2=="Remission"),
     subset(my_dataframe, GR2=="Relapse"))

I used %in% only once, because there was only one "compound value" (XX1 or XX2) 
for subsetting. But then it occurred to me to use %in% everywhere, taking advantage of 
the fact that a scalar value is the same as a length-1 vector:

list(
     subset(my_dataframe, GR1 %in% "XX1"),
     subset(my_dataframe, GR1 %in% "XX2"),
     subset(my_dataframe, GR1 %in% "YY"),
     subset(my_dataframe, GR1 %in% c("XX1", "XX2")),
     subset(my_dataframe, GR2 %in% "Remission"),
     subset(my_dataframe, GR2 %in% "Relapse"))

It works just fine.  Are there any problems with this style, from the 
standpoints of correctness, aesthetics, etc.?

If GR1 or GR2 has a missing value, you get NA from the equality tests, but FALSE from the %in% tests. That won't affect subset (where NA and FALSE both result in the omission of the observation), but it might affect other code like this. For example, if you had selected rows using a logical index instead of using subset, the NA entries in the index would result in NA selections in the data.

Duncan Murdoch

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to