Hi Bert, many thanks for all your help and your comments. I learn at lot this way.
My question was about is.na() at the first sight but the actual task looks like this: I have two variables in my customer data that signal if the customer accout was closed by master data management or by sales. Say these variables are closed_mdm and closed_sls. They contain NA if the customer account is still open or a closing code from "01" to "08" if the customer account was closed and why. For my analysis I need a variable that combines the two variables closed_mdm and closed_sls to set a filter easily on those who are closed not matter what the reason was nor who closed the account. As I always encounter problems when dealing with ifelse statements and NA I decided to merge these two variables to one variable containing 0 = not closed and 1 = closed. In my context this seems to be - at least to me - a reasonable approach. Replacement of missing values and merging the variables is the easiest way for me. -- cut -- cust_id <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20) closed_mdm <- c("01", NA, NA, NA, "08", "07", NA, NA, "05", NA, NA, NA, "04", NA, NA, NA, NA, NA, NA, NA) closed_sls <- c(NA, "08", NA, NA, "08", "07", NA, NA, NA, NA, "03", NA, NA, NA, "05", NA, NA, NA, NA, NA) # 1st try ds_temp1 <- data.frame(cust_id, closed_mdm, closed_sls) ds_temp1 ds_temp1$closed <- closed_mdm | closed_sls # WRONG # 2nd try closed_mdm_fac1 <- as.factor(closed_mdm) closed_sls_fac1 <- as.factor(closed_sls) ds_temp2 <- data.frame(cust_id, closed_mdm_fac1, closed_sls_fac1) ds_temp2 ds_temp2$closed <- ds_temp$closed_mdm_fac1 | ds_temp$closed_sls_fac1 # WRONG # 3rd try closed_mdm_num1 <- as.numeric(closed_mdm) # OK closed_sls_num1 <- as.numeric(closed_sls) # OK ds_temp3 <- data.frame(cust_id, closed_mdm_num1, closed_sls_num1) ds_temp3 ds_temp3$closed <- ds_temp$closed_mdm_num1 | ds_temp$closed_sls_num1 # WRONG # 4th try ds_temp4 <- ds_temp3 ds_temp4 # Does not run due to not allowed NA in subscripts ds_temp4[is.na(ds_temp4$closed_mdm_num1), ds_temp4$closed_mdm_num1] <- 0 ds_temp4[is.na(ds_temp4$closed_sls_num1), ds_temp4$closed_sls_num1] <- 0 # 5th try ds_temp4$closed_mdm_num1 <- ifelse(is.na(ds_temp4$closed_mdm_num1), 1, 0) ds_temp4$closed_sls_num1 <- ifelse(is.na(ds_temp4$closed_sls_num1), 1, 0) ds_temp4 ds_temp4$closed <- ifelse(ds_temp4$closed_mdm_num1 == 1 | ds_temp4$closed_sls_num1 == 1, 1, 0) ds_temp4 -- cut -- Is there a better way to do it? Kind regards Georg > Gesendet: Donnerstag, 23. Juni 2016 um 23:55 Uhr > Von: "Bert Gunter" <bgunter.4...@gmail.com> > An: "David L Carlson" <dcarl...@tamu.edu> > Cc: "R Help" <r-help@r-project.org> > Betreff: Re: [R] Subscripting problem with is.na() > > ... actually, FWIW, I would say that this little discussion mostly > demonstrates why the OP's request is probably not a good idea in the > first place. Usually, NA's should be left as NA's to be dealt with > properly by R and packages. In biological measurements, for example, > NA's often mean "below the ability to reliably measure." Biologists > with whom I've worked over many years often want to convert these to 0 > or omit the cases, both of which lead to biased estimates and/or > underestimates of variability and excess claims of "statistical > significance" (for those who belong to this religious persuasion). One > should never say never, but I suspect that there are relatively few > circumstances where the conversion the OP requested is actually wise. > > Feel free to ignore/reject such extraneous comments of course. > > Cheers, > Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Thu, Jun 23, 2016 at 12:14 PM, David L Carlson <dcarl...@tamu.edu> wrote: > > Good point. I did not think about factors. Also your example raises another > > issue since column c is logical, but gets silently converted to numeric. > > This would seem to get the job done assuming the conversion is intended for > > numeric columns only: > > > >> test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3)) > >> sapply(test, class) > > a b c > > "numeric" "factor" "logical" > >> num <- sapply(test, is.numeric) > >> test[, num][is.na(test[, num])] <- 0 > >> test > > a b c > > 1 1 A NA > > 2 0 b NA > > 3 2 <NA> NA > > > > David C > > > > -----Original Message----- > > From: Bert Gunter [mailto:bgunter.4...@gmail.com] > > Sent: Thursday, June 23, 2016 1:48 PM > > To: David L Carlson > > Cc: Ivan Calandra; R Help > > Subject: Re: [R] Subscripting problem with is.na() > > > > Not in general, David: > > > > e.g. > > > >> test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3)) > > > >> is.na(test) > > a b c > > [1,] FALSE FALSE TRUE > > [2,] TRUE FALSE TRUE > > [3,] FALSE TRUE TRUE > > > >> test[is.na(test)] > > [1] NA NA NA NA NA > > > >> test[is.na(test)] <- 0 > > Warning message: > > In `[<-.factor`(`*tmp*`, thisvar, value = 0) : > > invalid factor level, NA generated > > > >> test > > a b c > > 1 1 A 0 > > 2 0 b 0 > > 3 2 <NA> 0 > > > > > > The problem is the default conversion to factors and the replacement > > operation for factors. So: > > > >> test <- data.frame(a=c(1,NA,2), b = I(c("A","b",NA_character_)), c= > >> rep(NA,3)) > >> class(test$b) > > [1] "AsIs" ## so NOT a factor > > > >> test[is.na(test)] <- 0 # now works as you describe > >> test > > a b c > > 1 1 A 0 > > 2 0 b 0 > > 3 2 0 0 > > > > Of course the OP (and you) probably had a data frame of all numerics > > in mind, so the problem doesn't arise. But I think one needs to make > > the distinction and issue clear. > > > > Cheers, > > Bert > > > > > > > > > > > > Bert Gunter > > > > "The trouble with having an open mind is that people keep coming along > > and sticking things into it." > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > > > > On Thu, Jun 23, 2016 at 8:46 AM, David L Carlson <dcarl...@tamu.edu> wrote: > >> The function is.na() returns a matrix when applied to a data.frame so you > >> can easily convert all the NAs to 0's: > >> > >>> ds_test > >> var1 var2 > >> 1 1 1 > >> 2 2 2 > >> 3 3 3 > >> 4 NA NA > >> 5 5 5 > >> 6 6 6 > >> 7 7 7 > >> 8 NA NA > >> 9 9 9 > >> 10 10 10 > >>> is.na(ds_test) > >> var1 var2 > >> [1,] FALSE FALSE > >> [2,] FALSE FALSE > >> [3,] FALSE FALSE > >> [4,] TRUE TRUE > >> [5,] FALSE FALSE > >> [6,] FALSE FALSE > >> [7,] FALSE FALSE > >> [8,] TRUE TRUE > >> [9,] FALSE FALSE > >> [10,] FALSE FALSE > >>> ds_test[is.na(ds_test)] <- 0 > >>> ds_test > >> var1 var2 > >> 1 1 1 > >> 2 2 2 > >> 3 3 3 > >> 4 0 0 > >> 5 5 5 > >> 6 6 6 > >> 7 7 7 > >> 8 0 0 > >> 9 9 9 > >> 10 10 10 > >> > >> ------------------------------------- > >> David L Carlson > >> Department of Anthropology > >> Texas A&M University > >> College Station, TX 77840-4352 > >> > >> -----Original Message----- > >> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ivan > >> Calandra > >> Sent: Thursday, June 23, 2016 10:14 AM > >> To: R Help > >> Subject: Re: [R] Subscripting problem with is.na() > >> > >> Thank you Bert for this clarification. It is indeed an important point. > >> > >> Ivan > >> > >> -- > >> Ivan Calandra, PhD > >> Scientific Mediator > >> University of Reims Champagne-Ardenne > >> GEGENAA - EA 3795 > >> CREA - 2 esplanade Roland Garros > >> 51100 Reims, France > >> +33(0)3 26 77 36 89 > >> ivan.calan...@univ-reims.fr > >> -- > >> https://www.researchgate.net/profile/Ivan_Calandra > >> https://publons.com/author/705639/ > >> > >> Le 23/06/2016 à 17:06, Bert Gunter a écrit : > >>> Sorry, Ivan, your statement is incorrect: > >>> > >>> "When you use a single bracket on a list with only one argument in > >>> between, then R extracts "elements", i.e. columns in the case of a > >>> data.frame. This explains your errors. " > >>> > >>> e.g. > >>> > >>>> ex <- data.frame(a = 1:3, b = letters[1:3]) > >>>> a <- 1:3 > >>>> identical(ex[1], a) > >>> [1] FALSE > >>> > >>>> class(ex[1]) > >>> [1] "data.frame" > >>>> class(a) > >>> [1] "integer" > >>> > >>> Compare: > >>> > >>>> identical(ex[[1]], a) > >>> [1] TRUE > >>> > >>> Why? Single bracket extraction on a list results in a list; double > >>> bracket extraction results in the element of the list ( a "column" in > >>> the case of a data frame, which is a specific kind of list). The > >>> relevant sections of ?Extract are: > >>> > >>> "Indexing by [ is similar to atomic vectors and selects a **list** of > >>> the specified element(s). > >>> > >>> Both [[ and $ select a **single element of the list**. " > >>> > >>> > >>> Hope this clarifies this often-confused issue. > >>> > >>> > >>> Cheers, > >>> Bert > >>> Bert Gunter > >>> > >>> "The trouble with having an open mind is that people keep coming along > >>> and sticking things into it." > >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >>> > >>> > >>> On Thu, Jun 23, 2016 at 7:34 AM, Ivan Calandra > >>> <ivan.calan...@univ-reims.fr> wrote: > >>>> My statement "Using a single bracket '[' on a data.frame does the same as > >>>> for matrices: you need to specify rows and columns" was not correct. > >>>> > >>>> > >>>> When you use a single bracket on a list with only one argument in > >>>> between, > >>>> then R extracts "elements", i.e. columns in the case of a data.frame. > >>>> This > >>>> explains your errors. > >>>> > >>>> But it is possible to use a single bracket on a data.frame with 2 > >>>> arguments > >>>> (rows, columns) separated by a comma, as with matrices. This is the > >>>> solution > >>>> you received. > >>>> > >>>> Ivan > >>>> > >>>> > >>>> -- > >>>> Ivan Calandra, PhD > >>>> Scientific Mediator > >>>> University of Reims Champagne-Ardenne > >>>> GEGENAA - EA 3795 > >>>> CREA - 2 esplanade Roland Garros > >>>> 51100 Reims, France > >>>> +33(0)3 26 77 36 89 > >>>> ivan.calan...@univ-reims.fr > >>>> -- > >>>> https://www.researchgate.net/profile/Ivan_Calandra > >>>> https://publons.com/author/705639/ > >>>> > >>>> Le 23/06/2016 à 16:27, Ivan Calandra a écrit : > >>>>> Dear Georg, > >>>>> > >>>>> You need to learn a bit more about the subsetting methods, depending on > >>>>> the object structure you're trying to subset. > >>>>> > >>>>> More specifically, when you run this: ds_test[is.na(ds_test$var1)] > >>>>> you get this error: "Error in `[.data.frame`(ds_test, > >>>>> is.na(ds_test$var1)) > >>>>> : undefined columns selected" > >>>>> > >>>>> This means that R does not understand which column you're trying to > >>>>> select. But you're actually trying to select rows. > >>>>> > >>>>> Using a single bracket '[' on a data.frame does the same as for > >>>>> matrices: > >>>>> you need to specify rows and columns, like this: > >>>>> ds_test[is.na(ds_test$var1), ] ## notice the last comma > >>>>> ds_test[is.na(ds_test$var1), ] <- 0 ## works on all columns because you > >>>>> didn't specify any after the comma > >>>>> > >>>>> If you want it only for "var1", then you need to specify the column: > >>>>> ds_test[is.na(ds_test$var1), "var1"] <- 0 > >>>>> > >>>>> It's the same problem with your 2nd and 4th tries (4th one has other > >>>>> problems). Your 3rd try does not change ds_test at all. > >>>>> > >>>>> HTH, > >>>>> Ivan > >>>>> > >>>>> -- > >>>>> Ivan Calandra, PhD > >>>>> Scientific Mediator > >>>>> University of Reims Champagne-Ardenne > >>>>> GEGENAA - EA 3795 > >>>>> CREA - 2 esplanade Roland Garros > >>>>> 51100 Reims, France > >>>>> +33(0)3 26 77 36 89 > >>>>> ivan.calan...@univ-reims.fr > >>>>> -- > >>>>> https://www.researchgate.net/profile/Ivan_Calandra > >>>>> https://publons.com/author/705639/ > >>>>> > >>>>> Le 23/06/2016 à 15:57, g.maub...@weinwolf.de a écrit : > >>>>>> Hi All, > >>>>>> > >>>>>> I would like to recode my NAs to 0. Using a single vector everything is > >>>>>> fine. > >>>>>> > >>>>>> But if I use a data.frame things go wrong: > >>>>>> > >>>>>> -- cut -- > >>>>>> > >>>>>> var1 <- c(1:3, NA, 5:7, NA, 9:10) > >>>>>> var2 <- c(1:3, NA, 5:7, NA, 9:10) > >>>>>> ds_test <- > >>>>>> data.frame(var1, var2) > >>>>>> > >>>>>> test <- var1 > >>>>>> test[is.na(test)] <- 0 > >>>>>> test # NA recoded OK > >>>>>> > >>>>>> # First try > >>>>>> ds_test[is.na(ds_test$var1)] <- 0 # duplicate subscripts WRONG > >>>>>> > >>>>>> # Second try > >>>>>> ds_test[is.na("var1")] <- 0 > >>>>>> ds_test$var1 # not recoded WRONG > >>>>>> > >>>>>> # Third try: to me the most intuitive approach > >>>>>> is.na(ds_test["var1"]) <- 0 # attempt to select less than one element > >>>>>> in > >>>>>> integerOneIndex WRONG > >>>>>> > >>>>>> # Fourth try > >>>>>> ds_test[is.na(var1)] <- 0 # duplicate subscripts for columns WRONG > >>>>>> > >>>>>> -- cut -- > >>>>>> How can I do it correctly? > >>>>>> > >>>>>> Where could I have found something about it? > >>>>>> > >>>>>> Kind regards > >>>>>> > >>>>>> Georg > >>>>>> > >>>>>> ______________________________________________ > >>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>>> PLEASE do read the posting guide > >>>>>> http://www.R-project.org/posting-guide.html > >>>>>> and provide commented, minimal, self-contained, reproducible code. > >>>>>> > >>>>> ______________________________________________ > >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>> PLEASE do read the posting guide > >>>>> http://www.R-project.org/posting-guide.html > >>>>> and provide commented, minimal, self-contained, reproducible code. > >>>>> > >>>> ______________________________________________ > >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>> PLEASE do read the posting guide > >>>> http://www.R-project.org/posting-guide.html > >>>> and provide commented, minimal, self-contained, reproducible code. > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> ______________________________________________ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.