This will work as well: d<-data.frame(d1 = letters[1:3], d2 = c(1,2,3), d3 = c(NA_character_,NA_character_,6))
apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) d1 d2 d3 FALSE TRUE FALSE i.e. when NA changed do NA_character_ pt., 8 paź 2021 o 20:44 Derickson, Ryan, VHA NCOD via R-help <r-help@r-project.org> napisał(a): > > This is interesting and does seem suboptimal. Especially because if I start > with a matrix from the beginning, it behaves as expected. > > > d<-data.frame(d1 = letters[1:3], > + d2 = c("1","2","3"), > + d3 = c(NA,NA,"6")) > > > > str(d) > 'data.frame': 3 obs. of 3 variables: > $ d1: chr "a" "b" "c" > $ d2: chr "1" "2" "3" > $ d3: chr NA NA "6" > > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > d1 d2 d3 > FALSE TRUE FALSE > > > > > -----Original Message----- > From: Jiefei Wang <szwj...@gmail.com> > Sent: Friday, October 8, 2021 2:22 PM > To: Derickson, Ryan, VHA NCOD <ryan.derick...@va.gov> > Cc: r-help@r-project.org > Subject: [EXTERNAL] Re: [R] unexpected behavior in apply > > Ok, it turns out that this is documented, even though it looks surprising. > > First of all, the apply function will try to convert any object with the dim > attribute to a matrix(my intuition agrees with you that there should be no > conversion), so the first step of the apply function is > > > as.matrix.data.frame(d) > d1 d2 d3 > [1,] "a" "1" NA > [2,] "b" "2" NA > [3,] "c" "3" " 6" > > Since the data frame `d` is a mixture of character and non-character values, > the non-character value will be converted to the character using the function > `format`. However, the problem is that the NA value will also be formatted to > the character > > > format(c(NA, 6)) > [1] "NA" " 6" > > That's where the space comes from. It is purely for making the result > pretty... The character NA will be removed later, but the space is not > stripped. I would say this is not a good design, and it might be worth not > including the NA value in the format function. At the current stage, I will > suggest using the function `lapply` to do what you want. > > > lapply(d, FUN=function(x)all(x[!is.na(x)] <= 3)) > $d1 > [1] FALSE > $d2 > [1] TRUE > $d3 > [1] FALSE > > Everything should work as you expect. > > Best, > Jiefei > > On Sat, Oct 9, 2021 at 2:03 AM Jiefei Wang <szwj...@gmail.com> wrote: > > > > Hi, > > > > I guess this can tell you what happens behind the scene > > > > > > > d<-data.frame(d1 = letters[1:3], > > + d2 = c(1,2,3), > > + d3 = c(NA,NA,6)) > > > apply(d, 2, FUN=function(x)x) > > d1 d2 d3 > > [1,] "a" "1" NA > > [2,] "b" "2" NA > > [3,] "c" "3" " 6" > > > "a"<=3 > > [1] FALSE > > > "2"<=3 > > [1] TRUE > > > "6"<=3 > > [1] FALSE > > > > Note that there is an additional space in the character value " 6", > > that's why your comparison fails. I do not understand why but this > > might be a bug in R > > > > Best, > > Jiefei > > > > On Sat, Oct 9, 2021 at 1:49 AM Derickson, Ryan, VHA NCOD via R-help > > <r-help@r-project.org> wrote: > > > > > > Hello, > > > > > > I'm seeing unexpected behavior when using apply() compared to a for loop > > > when a character vector is part of the data subjected to the apply > > > statement. Below, I check whether all non-missing values are <= 3. If I > > > include a character column, apply incorrectly returns TRUE for d3. If I > > > only pass the numeric columns to apply, it is correct for d3. If I use a > > > for loop, it is correct. > > > > > > > d<-data.frame(d1 = letters[1:3], > > > + d2 = c(1,2,3), > > > + d3 = c(NA,NA,6)) > > > > > > > > d > > > d1 d2 d3 > > > 1 a 1 NA > > > 2 b 2 NA > > > 3 c 3 6 > > > > > > > > # results are incorrect > > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > > > d1 d2 d3 > > > FALSE TRUE TRUE > > > > > > > > # results are correct > > > > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > > > d2 d3 > > > TRUE FALSE > > > > > > > > # results are correct > > > > for(i in names(d)){ > > > + print(all(d[!is.na(d[,i]),i] <= 3)) } > > > [1] FALSE > > > [1] TRUE > > > [1] FALSE > > > > > > > > > Finally, if I remove the NA values from d3 and include the character > > > column in apply, it is correct. > > > > > > > d<-data.frame(d1 = letters[1:3], > > > + d2 = c(1,2,3), > > > + d3 = c(4,5,6)) > > > > > > > > d > > > d1 d2 d3 > > > 1 a 1 4 > > > 2 b 2 5 > > > 3 c 3 6 > > > > > > > > # results are correct > > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > > > d1 d2 d3 > > > FALSE TRUE FALSE > > > > > > > > > Can someone help me understand what's happening? > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fst > > > at.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=04%7C01%7C%7Cd4c50 > > > d8f8da547cbf36108d98a88880c%7Ce95f1b23abaf45ee821db7ab251ab3bf%7C0%7 > > > C0%7C637693141284202940%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi > > > LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=3KAp > > > Y5pdxAh5BzVZvjyrQKTpqkigQmW8N7pmU7DQGcU%3D&reserved=0 > > > PLEASE do read the posting guide > > > https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww > > > .r-project.org%2Fposting-guide.html&data=04%7C01%7C%7Cd4c50d8f8d > > > a547cbf36108d98a88880c%7Ce95f1b23abaf45ee821db7ab251ab3bf%7C0%7C0%7C > > > 637693141284202940%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQI > > > joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=mgrquTpZU > > > SQt7cGywiHtaKWrdqAjvaG4gFx9aD7nRlA%3D&reserved=0 > > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.