R silently converts the integer to a character for comparison in the subset 
operation.  But if we explicitly do the conversion we see that it does not work 
with the default R settings.

> as.character(100000)
[1] "1e+05"
> as.character(99999)
[1] "99999"


--
W. Michael Conklin
EVP Marketing & Data Sciences
GfK 
T +1 763 417 4545 | M +1 612 567 8287 


-----Original Message-----
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Karl Schilling
Sent: Tuesday, November 17, 2015 1:14 PM
To: r-help@r-project.org
Subject: [R] Strange result when subsetting a data frame based on a character 
variable

Dear all,

I have one observation that I do not quite understand. Maybe someone can 
clarify this issue for me.

I have a data frame which I want to subset based on a grouping variable, say 
"group". Actually, "group" is a numeric value, but it is saved as a character. 
I give some code to generate an exemplary data frame below.

Now, if I use

MySubset <- subset(Data, Data$group == "..")

everything works fine, as expected. ".." stands here for the value of group 
given as a character string.

Surprisingly, I also get a correct subsetting if I simply give the plain 
numeric value of group (like MySubset <- subset(Data, Data$group == ..), AS 
LONG AS this numeric value is less then 100000.

If the numeric value is 100000 or larger, I get an empty subset.

OK, I know how to avoid this situation, but I wonder what the explanation for 
this for me rather strange behavior might be.

Thank you so much for your suggestions.


Karl Schilling


#####
Exemplary code for reproducing the above described problem:

options(stringsAsFactors = F)

# set up some data frame
value <- c(1:6)
group <- rep(c("20000", "99999", "100000"), each = 2) Data <- data.frame(value 
= value, group = group)
str(Data)

# subset data frame based on the value of the variable "group", # treating this 
value once as a character, and once as a number:

Data20 <- subset(Data, Data$group =="20000")
str(Data20)
Data20N <- subset(Data, Data$group ==20000)
str(Data20N)


Data99 <- subset(Data, Data$group =="99999")
str(Data99)
Data99N <- subset(Data, Data$group ==99999)
str(Data99N)
Data100 <- subset(Data, Data$group =="100000")
str(Data100)
Data100N <- subset(Data, Data$group ==100000)
str(Data100N)

--
Karl Schilling

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to