Dear Duncan, I'd rather convert the numeric to character. E.g. with sprintf() or format() in case it is a numeric vector.
subset(Data, group == "100000") subset(Data, group == sprintf("%.f", 100000)) sprintf("%.f", 100000) # "100000" It requires the user to think about the format, which can reduce errors. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-11-17 21:27 GMT+01:00 Duncan Murdoch <murdoch.dun...@gmail.com>: > On 17/11/2015 2:25 PM, Duncan Murdoch wrote: > >> On 17/11/2015 2:14 PM, Karl Schilling wrote: >> > Dear all, >> > >> > I have one observation that I do not quite understand. Maybe someone >> > can clarify this issue for me. >> > >> > I have a data frame which I want to subset based on a grouping variable, >> > say "group". Actually, "group" is a numeric value, but it is saved as a >> > character. I give some code to generate an exemplary data frame below. >> > >> > Now, if I use >> > >> > MySubset <- subset(Data, Data$group == "..") >> > >> > everything works fine, as expected. ".." stands here for the value of >> > group given as a character string. >> > >> > Surprisingly, I also get a correct subsetting if I simply give the plain >> > numeric value of group (like MySubset <- subset(Data, Data$group == ..), >> > AS LONG AS this numeric value is less then 100000. >> > >> > If the numeric value is 100000 or larger, I get an empty subset. >> > >> > OK, I know how to avoid this situation, but I wonder what the >> > explanation for this for me rather strange behavior might be. >> > >> > Thank you so much for your suggestions. >> >> If you are comparing a character value to a numeric value, the numeric >> value is converted to character using as.character() for the >> comparison. as.character(100000) or a larger number is likely not >> "100000"; try it. (With the options I have on my >> computer, I get "1e+05".) >> >> If you want a numeric comparison, be explicit: >> >> subset(Data, as.numeric(Data$group) == ..) >> > > This might be bad advice. If Data$group is a factor (as it tends to be > when character data is put in a dataframe), this will use the underlying > factor code, not the visible one. You need to use > > as.numeric(as.character(Data$group)) > > to do the conversion you probably want. > > Duncan Murdoch > > >> >> Duncan Murdoch >> >> > >> > >> > Karl Schilling >> > >> > >> > ##### >> > Exemplary code for reproducing the above described problem: >> > >> > options(stringsAsFactors = F) >> > >> > # set up some data frame >> > value <- c(1:6) >> > group <- rep(c("20000", "99999", "100000"), each = 2) >> > Data <- data.frame(value = value, group = group) >> > str(Data) >> > >> > # subset data frame based on the value of the variable "group", >> > # treating this value once as a character, and once as a number: >> > >> > Data20 <- subset(Data, Data$group =="20000") >> > str(Data20) >> > Data20N <- subset(Data, Data$group ==20000) >> > str(Data20N) >> > >> > >> > Data99 <- subset(Data, Data$group =="99999") >> > str(Data99) >> > Data99N <- subset(Data, Data$group ==99999) >> > str(Data99N) >> > Data100 <- subset(Data, Data$group =="100000") >> > str(Data100) >> > Data100N <- subset(Data, Data$group ==100000) >> > str(Data100N) >> > >> >> > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.