R silently converts the integer to a character for comparison in the subset operation. But if we explicitly do the conversion we see that it does not work with the default R settings.
> as.character(100000) [1] "1e+05" > as.character(99999) [1] "99999" -- W. Michael Conklin EVP Marketing & Data Sciences GfK T +1 763 417 4545 | M +1 612 567 8287 -----Original Message----- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Karl Schilling Sent: Tuesday, November 17, 2015 1:14 PM To: r-help@r-project.org Subject: [R] Strange result when subsetting a data frame based on a character variable Dear all, I have one observation that I do not quite understand. Maybe someone can clarify this issue for me. I have a data frame which I want to subset based on a grouping variable, say "group". Actually, "group" is a numeric value, but it is saved as a character. I give some code to generate an exemplary data frame below. Now, if I use MySubset <- subset(Data, Data$group == "..") everything works fine, as expected. ".." stands here for the value of group given as a character string. Surprisingly, I also get a correct subsetting if I simply give the plain numeric value of group (like MySubset <- subset(Data, Data$group == ..), AS LONG AS this numeric value is less then 100000. If the numeric value is 100000 or larger, I get an empty subset. OK, I know how to avoid this situation, but I wonder what the explanation for this for me rather strange behavior might be. Thank you so much for your suggestions. Karl Schilling ##### Exemplary code for reproducing the above described problem: options(stringsAsFactors = F) # set up some data frame value <- c(1:6) group <- rep(c("20000", "99999", "100000"), each = 2) Data <- data.frame(value = value, group = group) str(Data) # subset data frame based on the value of the variable "group", # treating this value once as a character, and once as a number: Data20 <- subset(Data, Data$group =="20000") str(Data20) Data20N <- subset(Data, Data$group ==20000) str(Data20N) Data99 <- subset(Data, Data$group =="99999") str(Data99) Data99N <- subset(Data, Data$group ==99999) str(Data99N) Data100 <- subset(Data, Data$group =="100000") str(Data100) Data100N <- subset(Data, Data$group ==100000) str(Data100N) -- Karl Schilling ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.