Is part of the issue that in common parlance "NA" or "N/A" may mean either "not available" or "not applicable" (e.g., isPregnant for a male) but in R NA means only "not available"?
Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Jun 24, 2016 at 8:37 AM, Bert Gunter <bgunter.4...@gmail.com> wrote: > As Petr and Don have shown you, changing NA to 0 is unnecessary to get > what you want. However, recoding to 0 may be OK, as NA has a specific > meaning in this context, and you are just adding an extra code to a > factor for a different level. > > But it still might cause you trouble later. One of R's strengths is > it's ability to simply deal with NA's -- most of the time anyway .For > example note that you would have to make sure these columns are > factors (*not numerics*), if you wanted to, say, investigate how > category of closing related to other covariates via e.g. multinomial > logistic regression or even just to tabulate the "closed" categories. > Keeping NA as NA allows R's built-in facilities to simply handle (e.g. > omit) the data for the "still open" cases, but you will have to do it > explicitly yourself if you code to 0. That seems to be asking for > trouble to me. > > As always, contrary views welcome. This discussion still seems on > (r-help) topic to me, but if not, please say so. > > Cheers, > Bert > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Fri, Jun 24, 2016 at 12:14 AM, <g.maub...@gmx.de> wrote: > > Hi Bert, > > > > many thanks for all your help and your comments. I learn at lot this way. > > > > My question was about is.na() at the first sight but the actual task > looks like this: > > > > I have two variables in my customer data that signal if the customer > accout was closed by master data management or by sales. Say these > variables are closed_mdm and closed_sls. They contain NA if the customer > account is still open or a closing code from "01" to "08" if the customer > account was closed and why. > > > > For my analysis I need a variable that combines the two variables > closed_mdm and closed_sls to set a filter easily on those who are closed > not matter what the reason was nor who closed the account. > > > > As I always encounter problems when dealing with ifelse statements and > NA I decided to merge these two variables to one variable containing 0 = > not closed and 1 = closed. In my context this seems to be - at least to me > - a reasonable approach. > > > > Replacement of missing values and merging the variables is the easiest > way for me. > > > > -- cut -- > > > > cust_id <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, > 18, 19, 20) > > closed_mdm <- c("01", NA, NA, NA, "08", "07", NA, NA, "05", NA, NA, NA, > "04", NA, NA, NA, NA, NA, NA, NA) > > closed_sls <- c(NA, "08", NA, NA, "08", "07", NA, NA, NA, NA, "03", NA, > NA, NA, "05", NA, NA, NA, NA, NA) > > > > # 1st try > > ds_temp1 <- data.frame(cust_id, closed_mdm, closed_sls) > > ds_temp1 > > > > ds_temp1$closed <- closed_mdm | closed_sls # WRONG > > > > # 2nd try > > closed_mdm_fac1 <- as.factor(closed_mdm) > > closed_sls_fac1 <- as.factor(closed_sls) > > > > ds_temp2 <- data.frame(cust_id, closed_mdm_fac1, closed_sls_fac1) > > ds_temp2 > > > > ds_temp2$closed <- ds_temp$closed_mdm_fac1 | ds_temp$closed_sls_fac1 # > WRONG > > > > # 3rd try > > closed_mdm_num1 <- as.numeric(closed_mdm) # OK > > closed_sls_num1 <- as.numeric(closed_sls) # OK > > > > ds_temp3 <- data.frame(cust_id, closed_mdm_num1, closed_sls_num1) > > ds_temp3 > > > > ds_temp3$closed <- ds_temp$closed_mdm_num1 | ds_temp$closed_sls_num1 # > WRONG > > > > # 4th try > > ds_temp4 <- ds_temp3 > > ds_temp4 > > > > # Does not run due to not allowed NA in subscripts > > ds_temp4[is.na(ds_temp4$closed_mdm_num1), ds_temp4$closed_mdm_num1] <- 0 > > ds_temp4[is.na(ds_temp4$closed_sls_num1), ds_temp4$closed_sls_num1] <- 0 > > > > # 5th try > > ds_temp4$closed_mdm_num1 <- ifelse(is.na(ds_temp4$closed_mdm_num1), 1, > 0) > > ds_temp4$closed_sls_num1 <- ifelse(is.na(ds_temp4$closed_sls_num1), 1, > 0) > > ds_temp4 > > > > ds_temp4$closed <- ifelse(ds_temp4$closed_mdm_num1 == 1 | > ds_temp4$closed_sls_num1 == 1, 1, 0) > > ds_temp4 > > > > -- cut -- > > > > Is there a better way to do it? > > > > Kind regards > > > > Georg > > > > > >> Gesendet: Donnerstag, 23. Juni 2016 um 23:55 Uhr > >> Von: "Bert Gunter" <bgunter.4...@gmail.com> > >> An: "David L Carlson" <dcarl...@tamu.edu> > >> Cc: "R Help" <r-help@r-project.org> > >> Betreff: Re: [R] Subscripting problem with is.na() > >> > >> ... actually, FWIW, I would say that this little discussion mostly > >> demonstrates why the OP's request is probably not a good idea in the > >> first place. Usually, NA's should be left as NA's to be dealt with > >> properly by R and packages. In biological measurements, for example, > >> NA's often mean "below the ability to reliably measure." Biologists > >> with whom I've worked over many years often want to convert these to 0 > >> or omit the cases, both of which lead to biased estimates and/or > >> underestimates of variability and excess claims of "statistical > >> significance" (for those who belong to this religious persuasion). One > >> should never say never, but I suspect that there are relatively few > >> circumstances where the conversion the OP requested is actually wise. > >> > >> Feel free to ignore/reject such extraneous comments of course. > >> > >> Cheers, > >> Bert > >> > >> > >> Bert Gunter > >> > >> "The trouble with having an open mind is that people keep coming along > >> and sticking things into it." > >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >> > >> > >> On Thu, Jun 23, 2016 at 12:14 PM, David L Carlson <dcarl...@tamu.edu> > wrote: > >> > Good point. I did not think about factors. Also your example raises > another issue since column c is logical, but gets silently converted to > numeric. This would seem to get the job done assuming the conversion is > intended for numeric columns only: > >> > > >> >> test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3)) > >> >> sapply(test, class) > >> > a b c > >> > "numeric" "factor" "logical" > >> >> num <- sapply(test, is.numeric) > >> >> test[, num][is.na(test[, num])] <- 0 > >> >> test > >> > a b c > >> > 1 1 A NA > >> > 2 0 b NA > >> > 3 2 <NA> NA > >> > > >> > David C > >> > > >> > -----Original Message----- > >> > From: Bert Gunter [mailto:bgunter.4...@gmail.com] > >> > Sent: Thursday, June 23, 2016 1:48 PM > >> > To: David L Carlson > >> > Cc: Ivan Calandra; R Help > >> > Subject: Re: [R] Subscripting problem with is.na() > >> > > >> > Not in general, David: > >> > > >> > e.g. > >> > > >> >> test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3)) > >> > > >> >> is.na(test) > >> > a b c > >> > [1,] FALSE FALSE TRUE > >> > [2,] TRUE FALSE TRUE > >> > [3,] FALSE TRUE TRUE > >> > > >> >> test[is.na(test)] > >> > [1] NA NA NA NA NA > >> > > >> >> test[is.na(test)] <- 0 > >> > Warning message: > >> > In `[<-.factor`(`*tmp*`, thisvar, value = 0) : > >> > invalid factor level, NA generated > >> > > >> >> test > >> > a b c > >> > 1 1 A 0 > >> > 2 0 b 0 > >> > 3 2 <NA> 0 > >> > > >> > > >> > The problem is the default conversion to factors and the replacement > >> > operation for factors. So: > >> > > >> >> test <- data.frame(a=c(1,NA,2), b = I(c("A","b",NA_character_)), c= > rep(NA,3)) > >> >> class(test$b) > >> > [1] "AsIs" ## so NOT a factor > >> > > >> >> test[is.na(test)] <- 0 # now works as you describe > >> >> test > >> > a b c > >> > 1 1 A 0 > >> > 2 0 b 0 > >> > 3 2 0 0 > >> > > >> > Of course the OP (and you) probably had a data frame of all numerics > >> > in mind, so the problem doesn't arise. But I think one needs to make > >> > the distinction and issue clear. > >> > > >> > Cheers, > >> > Bert > >> > > >> > > >> > > >> > > >> > > >> > Bert Gunter > >> > > >> > "The trouble with having an open mind is that people keep coming along > >> > and sticking things into it." > >> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >> > > >> > > >> > On Thu, Jun 23, 2016 at 8:46 AM, David L Carlson <dcarl...@tamu.edu> > wrote: > >> >> The function is.na() returns a matrix when applied to a data.frame > so you can easily convert all the NAs to 0's: > >> >> > >> >>> ds_test > >> >> var1 var2 > >> >> 1 1 1 > >> >> 2 2 2 > >> >> 3 3 3 > >> >> 4 NA NA > >> >> 5 5 5 > >> >> 6 6 6 > >> >> 7 7 7 > >> >> 8 NA NA > >> >> 9 9 9 > >> >> 10 10 10 > >> >>> is.na(ds_test) > >> >> var1 var2 > >> >> [1,] FALSE FALSE > >> >> [2,] FALSE FALSE > >> >> [3,] FALSE FALSE > >> >> [4,] TRUE TRUE > >> >> [5,] FALSE FALSE > >> >> [6,] FALSE FALSE > >> >> [7,] FALSE FALSE > >> >> [8,] TRUE TRUE > >> >> [9,] FALSE FALSE > >> >> [10,] FALSE FALSE > >> >>> ds_test[is.na(ds_test)] <- 0 > >> >>> ds_test > >> >> var1 var2 > >> >> 1 1 1 > >> >> 2 2 2 > >> >> 3 3 3 > >> >> 4 0 0 > >> >> 5 5 5 > >> >> 6 6 6 > >> >> 7 7 7 > >> >> 8 0 0 > >> >> 9 9 9 > >> >> 10 10 10 > >> >> > >> >> ------------------------------------- > >> >> David L Carlson > >> >> Department of Anthropology > >> >> Texas A&M University > >> >> College Station, TX 77840-4352 > >> >> > >> >> -----Original Message----- > >> >> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of > Ivan Calandra > >> >> Sent: Thursday, June 23, 2016 10:14 AM > >> >> To: R Help > >> >> Subject: Re: [R] Subscripting problem with is.na() > >> >> > >> >> Thank you Bert for this clarification. It is indeed an important > point. > >> >> > >> >> Ivan > >> >> > >> >> -- > >> >> Ivan Calandra, PhD > >> >> Scientific Mediator > >> >> University of Reims Champagne-Ardenne > >> >> GEGENAA - EA 3795 > >> >> CREA - 2 esplanade Roland Garros > >> >> 51100 Reims, France > >> >> +33(0)3 26 77 36 89 > >> >> ivan.calan...@univ-reims.fr > >> >> -- > >> >> https://www.researchgate.net/profile/Ivan_Calandra > >> >> https://publons.com/author/705639/ > >> >> > >> >> Le 23/06/2016 à 17:06, Bert Gunter a écrit : > >> >>> Sorry, Ivan, your statement is incorrect: > >> >>> > >> >>> "When you use a single bracket on a list with only one argument in > >> >>> between, then R extracts "elements", i.e. columns in the case of a > >> >>> data.frame. This explains your errors. " > >> >>> > >> >>> e.g. > >> >>> > >> >>>> ex <- data.frame(a = 1:3, b = letters[1:3]) > >> >>>> a <- 1:3 > >> >>>> identical(ex[1], a) > >> >>> [1] FALSE > >> >>> > >> >>>> class(ex[1]) > >> >>> [1] "data.frame" > >> >>>> class(a) > >> >>> [1] "integer" > >> >>> > >> >>> Compare: > >> >>> > >> >>>> identical(ex[[1]], a) > >> >>> [1] TRUE > >> >>> > >> >>> Why? Single bracket extraction on a list results in a list; double > >> >>> bracket extraction results in the element of the list ( a "column" > in > >> >>> the case of a data frame, which is a specific kind of list). The > >> >>> relevant sections of ?Extract are: > >> >>> > >> >>> "Indexing by [ is similar to atomic vectors and selects a **list** > of > >> >>> the specified element(s). > >> >>> > >> >>> Both [[ and $ select a **single element of the list**. " > >> >>> > >> >>> > >> >>> Hope this clarifies this often-confused issue. > >> >>> > >> >>> > >> >>> Cheers, > >> >>> Bert > >> >>> Bert Gunter > >> >>> > >> >>> "The trouble with having an open mind is that people keep coming > along > >> >>> and sticking things into it." > >> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >> >>> > >> >>> > >> >>> On Thu, Jun 23, 2016 at 7:34 AM, Ivan Calandra > >> >>> <ivan.calan...@univ-reims.fr> wrote: > >> >>>> My statement "Using a single bracket '[' on a data.frame does the > same as > >> >>>> for matrices: you need to specify rows and columns" was not > correct. > >> >>>> > >> >>>> > >> >>>> When you use a single bracket on a list with only one argument in > between, > >> >>>> then R extracts "elements", i.e. columns in the case of a > data.frame. This > >> >>>> explains your errors. > >> >>>> > >> >>>> But it is possible to use a single bracket on a data.frame with 2 > arguments > >> >>>> (rows, columns) separated by a comma, as with matrices. This is > the solution > >> >>>> you received. > >> >>>> > >> >>>> Ivan > >> >>>> > >> >>>> > >> >>>> -- > >> >>>> Ivan Calandra, PhD > >> >>>> Scientific Mediator > >> >>>> University of Reims Champagne-Ardenne > >> >>>> GEGENAA - EA 3795 > >> >>>> CREA - 2 esplanade Roland Garros > >> >>>> 51100 Reims, France > >> >>>> +33(0)3 26 77 36 89 > >> >>>> ivan.calan...@univ-reims.fr > >> >>>> -- > >> >>>> https://www.researchgate.net/profile/Ivan_Calandra > >> >>>> https://publons.com/author/705639/ > >> >>>> > >> >>>> Le 23/06/2016 à 16:27, Ivan Calandra a écrit : > >> >>>>> Dear Georg, > >> >>>>> > >> >>>>> You need to learn a bit more about the subsetting methods, > depending on > >> >>>>> the object structure you're trying to subset. > >> >>>>> > >> >>>>> More specifically, when you run this: ds_test[is.na > (ds_test$var1)] > >> >>>>> you get this error: "Error in `[.data.frame`(ds_test, is.na > (ds_test$var1)) > >> >>>>> : undefined columns selected" > >> >>>>> > >> >>>>> This means that R does not understand which column you're trying > to > >> >>>>> select. But you're actually trying to select rows. > >> >>>>> > >> >>>>> Using a single bracket '[' on a data.frame does the same as for > matrices: > >> >>>>> you need to specify rows and columns, like this: > >> >>>>> ds_test[is.na(ds_test$var1), ] ## notice the last comma > >> >>>>> ds_test[is.na(ds_test$var1), ] <- 0 ## works on all columns > because you > >> >>>>> didn't specify any after the comma > >> >>>>> > >> >>>>> If you want it only for "var1", then you need to specify the > column: > >> >>>>> ds_test[is.na(ds_test$var1), "var1"] <- 0 > >> >>>>> > >> >>>>> It's the same problem with your 2nd and 4th tries (4th one has > other > >> >>>>> problems). Your 3rd try does not change ds_test at all. > >> >>>>> > >> >>>>> HTH, > >> >>>>> Ivan > >> >>>>> > >> >>>>> -- > >> >>>>> Ivan Calandra, PhD > >> >>>>> Scientific Mediator > >> >>>>> University of Reims Champagne-Ardenne > >> >>>>> GEGENAA - EA 3795 > >> >>>>> CREA - 2 esplanade Roland Garros > >> >>>>> 51100 Reims, France > >> >>>>> +33(0)3 26 77 36 89 > >> >>>>> ivan.calan...@univ-reims.fr > >> >>>>> -- > >> >>>>> https://www.researchgate.net/profile/Ivan_Calandra > >> >>>>> https://publons.com/author/705639/ > >> >>>>> > >> >>>>> Le 23/06/2016 à 15:57, g.maub...@weinwolf.de a écrit : > >> >>>>>> Hi All, > >> >>>>>> > >> >>>>>> I would like to recode my NAs to 0. Using a single vector > everything is > >> >>>>>> fine. > >> >>>>>> > >> >>>>>> But if I use a data.frame things go wrong: > >> >>>>>> > >> >>>>>> -- cut -- > >> >>>>>> > >> >>>>>> var1 <- c(1:3, NA, 5:7, NA, 9:10) > >> >>>>>> var2 <- c(1:3, NA, 5:7, NA, 9:10) > >> >>>>>> ds_test <- > >> >>>>>> data.frame(var1, var2) > >> >>>>>> > >> >>>>>> test <- var1 > >> >>>>>> test[is.na(test)] <- 0 > >> >>>>>> test # NA recoded OK > >> >>>>>> > >> >>>>>> # First try > >> >>>>>> ds_test[is.na(ds_test$var1)] <- 0 # duplicate subscripts WRONG > >> >>>>>> > >> >>>>>> # Second try > >> >>>>>> ds_test[is.na("var1")] <- 0 > >> >>>>>> ds_test$var1 # not recoded WRONG > >> >>>>>> > >> >>>>>> # Third try: to me the most intuitive approach > >> >>>>>> is.na(ds_test["var1"]) <- 0 # attempt to select less than one > element in > >> >>>>>> integerOneIndex WRONG > >> >>>>>> > >> >>>>>> # Fourth try > >> >>>>>> ds_test[is.na(var1)] <- 0 # duplicate subscripts for columns > WRONG > >> >>>>>> > >> >>>>>> -- cut -- > >> >>>>>> How can I do it correctly? > >> >>>>>> > >> >>>>>> Where could I have found something about it? > >> >>>>>> > >> >>>>>> Kind regards > >> >>>>>> > >> >>>>>> Georg > >> >>>>>> > >> >>>>>> ______________________________________________ > >> >>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, > see > >> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >> >>>>>> PLEASE do read the posting guide > >> >>>>>> http://www.R-project.org/posting-guide.html > >> >>>>>> and provide commented, minimal, self-contained, reproducible > code. > >> >>>>>> > >> >>>>> ______________________________________________ > >> >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >> >>>>> PLEASE do read the posting guide > >> >>>>> http://www.R-project.org/posting-guide.html > >> >>>>> and provide commented, minimal, self-contained, reproducible code. > >> >>>>> > >> >>>> ______________________________________________ > >> >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> >>>> https://stat.ethz.ch/mailman/listinfo/r-help > >> >>>> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >> >>>> and provide commented, minimal, self-contained, reproducible code. > >> >> > >> >> ______________________________________________ > >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> >> https://stat.ethz.ch/mailman/listinfo/r-help > >> >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >> >> and provide commented, minimal, self-contained, reproducible code. > >> >> ______________________________________________ > >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> >> https://stat.ethz.ch/mailman/listinfo/r-help > >> >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >> >> and provide commented, minimal, self-contained, reproducible code. > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.