On 27/02/2015 10:27 AM, Dimitri Liakhovitski wrote: > Thank you very much, Duncan. > All this being said: > > What would you say is the most elegant and most safe way to solve such > a seemingly simple task?
If you have NA values, test for them explicitly, e.g. your original x[(x$c<6) | is.na(x$c),] I would write it as x[is.na(x$c) | x$c < 6,] but that's purely a style difference, I don't think it would affect execution time (or results). I like to put the weird case first because it will remind me that things are more complicated than you might guess. Duncan Murdoch > > Thank you! > > On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch > <murdoch.dun...@gmail.com> wrote: >> On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote: >>> So, Duncan, do I understand you correctly: >>> >>> When I use x$x<6, R doesn't know if it's TRUE or FALSE, so it returns >>> a logical value of NA. >> >> Yes, when x$x is NA. (Though I think you meant x$c.) >> >>> When this logical value is applied to a row, the R says: hell, I don't >>> know if I should keep it or not, so, just in case, I am going to keep >>> it, but I'll replace all the values in this row with NAs? >> >> Yes. Indexing with a logical NA is probably a mistake, and this is one >> way to signal it without actually triggering a warning or error. >> >> BTW, I should have mentioned that the example where you indexed using >> -which(x$c>=6) is a bad idea: if none of the entries were 6 or more, >> this would be indexing with an empty vector, and you'd get nothing, not >> everything. >> >> Duncan Murdoch >> >> >>> >>> On Fri, Feb 27, 2015 at 9:13 AM, Duncan Murdoch >>> <murdoch.dun...@gmail.com> wrote: >>>> On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote: >>>>> I know how to get the output I need, but I would benefit from an >>>>> explanation why R behaves the way it does. >>>>> >>>>> # I have a data frame x: >>>>> x = data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) >>>>> x >>>>> # I want to toss rows in x that contain values >=6. But I don't want >>>>> to toss my NAs there. >>>>> >>>>> subset(x,c<6) # Works correctly, but removes NAs in c, understand why >>>>> x[which(x$c<6),] # Works correctly, but removes NAs in c, understand why >>>>> x[-which(x$c>=6),] # output I need >>>>> >>>>> # Here is my question: why does the following line replace the values >>>>> of all rows that contain an NA # in x$c with NAs? >>>>> >>>>> x[x$c<6,] # Leaves rows with c=NA, but makes the whole row an NA. Why??? >>>>> x[(x$c<6) | is.na(x$c),] # output I need - I have to be super-explicit >>>>> >>>>> Thank you very much! >>>> >>>> Most of your examples (except the ones using which()) are doing logical >>>> indexing. In logical indexing, TRUE keeps a line, FALSE drops the line, >>>> and NA returns NA. Since "x$c < 6" is NA if x$c is NA, you get the >>>> third kind of indexing. >>>> >>>> Your last example works because in the cases where x$c is NA, it >>>> evaluates NA | TRUE, and that evaluates to TRUE. In the cases where x$c >>>> is not NA, you get x$c < 6 | FALSE, and that's the same as x$c < 6, >>>> which will be either TRUE or FALSE. >>>> >>>> Duncan Murdoch >>>> >>> >>> >>> >> > > > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.