I would tend to agree. But NA is still preferable for both, no? -- Bert Bert Gunter
"The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Jun 24, 2016 at 8:42 AM, William Dunlap <wdun...@tibco.com> wrote: > Is part of the issue that in common parlance "NA" or "N/A" may > mean either "not available" or "not applicable" (e.g., isPregnant > for a male) but in R NA means only "not available"? > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Fri, Jun 24, 2016 at 8:37 AM, Bert Gunter <bgunter.4...@gmail.com> wrote: >> >> As Petr and Don have shown you, changing NA to 0 is unnecessary to get >> what you want. However, recoding to 0 may be OK, as NA has a specific >> meaning in this context, and you are just adding an extra code to a >> factor for a different level. >> >> But it still might cause you trouble later. One of R's strengths is >> it's ability to simply deal with NA's -- most of the time anyway .For >> example note that you would have to make sure these columns are >> factors (*not numerics*), if you wanted to, say, investigate how >> category of closing related to other covariates via e.g. multinomial >> logistic regression or even just to tabulate the "closed" categories. >> Keeping NA as NA allows R's built-in facilities to simply handle (e.g. >> omit) the data for the "still open" cases, but you will have to do it >> explicitly yourself if you code to 0. That seems to be asking for >> trouble to me. >> >> As always, contrary views welcome. This discussion still seems on >> (r-help) topic to me, but if not, please say so. >> >> Cheers, >> Bert >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Fri, Jun 24, 2016 at 12:14 AM, <g.maub...@gmx.de> wrote: >> > Hi Bert, >> > >> > many thanks for all your help and your comments. I learn at lot this >> > way. >> > >> > My question was about is.na() at the first sight but the actual task >> > looks like this: >> > >> > I have two variables in my customer data that signal if the customer >> > accout was closed by master data management or by sales. Say these >> > variables >> > are closed_mdm and closed_sls. They contain NA if the customer account is >> > still open or a closing code from "01" to "08" if the customer account was >> > closed and why. >> > >> > For my analysis I need a variable that combines the two variables >> > closed_mdm and closed_sls to set a filter easily on those who are closed >> > not >> > matter what the reason was nor who closed the account. >> > >> > As I always encounter problems when dealing with ifelse statements and >> > NA I decided to merge these two variables to one variable containing 0 = >> > not >> > closed and 1 = closed. In my context this seems to be - at least to me - a >> > reasonable approach. >> > >> > Replacement of missing values and merging the variables is the easiest >> > way for me. >> > >> > -- cut -- >> > >> > cust_id <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, >> > 18, 19, 20) >> > closed_mdm <- c("01", NA, NA, NA, "08", "07", NA, NA, "05", NA, NA, NA, >> > "04", NA, NA, NA, NA, NA, NA, NA) >> > closed_sls <- c(NA, "08", NA, NA, "08", "07", NA, NA, NA, NA, "03", NA, >> > NA, NA, "05", NA, NA, NA, NA, NA) >> > >> > # 1st try >> > ds_temp1 <- data.frame(cust_id, closed_mdm, closed_sls) >> > ds_temp1 >> > >> > ds_temp1$closed <- closed_mdm | closed_sls # WRONG >> > >> > # 2nd try >> > closed_mdm_fac1 <- as.factor(closed_mdm) >> > closed_sls_fac1 <- as.factor(closed_sls) >> > >> > ds_temp2 <- data.frame(cust_id, closed_mdm_fac1, closed_sls_fac1) >> > ds_temp2 >> > >> > ds_temp2$closed <- ds_temp$closed_mdm_fac1 | ds_temp$closed_sls_fac1 # >> > WRONG >> > >> > # 3rd try >> > closed_mdm_num1 <- as.numeric(closed_mdm) # OK >> > closed_sls_num1 <- as.numeric(closed_sls) # OK >> > >> > ds_temp3 <- data.frame(cust_id, closed_mdm_num1, closed_sls_num1) >> > ds_temp3 >> > >> > ds_temp3$closed <- ds_temp$closed_mdm_num1 | ds_temp$closed_sls_num1 # >> > WRONG >> > >> > # 4th try >> > ds_temp4 <- ds_temp3 >> > ds_temp4 >> > >> > # Does not run due to not allowed NA in subscripts >> > ds_temp4[is.na(ds_temp4$closed_mdm_num1), ds_temp4$closed_mdm_num1] <- 0 >> > ds_temp4[is.na(ds_temp4$closed_sls_num1), ds_temp4$closed_sls_num1] <- 0 >> > >> > # 5th try >> > ds_temp4$closed_mdm_num1 <- ifelse(is.na(ds_temp4$closed_mdm_num1), 1, >> > 0) >> > ds_temp4$closed_sls_num1 <- ifelse(is.na(ds_temp4$closed_sls_num1), 1, >> > 0) >> > ds_temp4 >> > >> > ds_temp4$closed <- ifelse(ds_temp4$closed_mdm_num1 == 1 | >> > ds_temp4$closed_sls_num1 == 1, 1, 0) >> > ds_temp4 >> > >> > -- cut -- >> > >> > Is there a better way to do it? >> > >> > Kind regards >> > >> > Georg >> > >> > >> >> Gesendet: Donnerstag, 23. Juni 2016 um 23:55 Uhr >> >> Von: "Bert Gunter" <bgunter.4...@gmail.com> >> >> An: "David L Carlson" <dcarl...@tamu.edu> >> >> Cc: "R Help" <r-help@r-project.org> >> >> Betreff: Re: [R] Subscripting problem with is.na() >> >> >> >> ... actually, FWIW, I would say that this little discussion mostly >> >> demonstrates why the OP's request is probably not a good idea in the >> >> first place. Usually, NA's should be left as NA's to be dealt with >> >> properly by R and packages. In biological measurements, for example, >> >> NA's often mean "below the ability to reliably measure." Biologists >> >> with whom I've worked over many years often want to convert these to 0 >> >> or omit the cases, both of which lead to biased estimates and/or >> >> underestimates of variability and excess claims of "statistical >> >> significance" (for those who belong to this religious persuasion). One >> >> should never say never, but I suspect that there are relatively few >> >> circumstances where the conversion the OP requested is actually wise. >> >> >> >> Feel free to ignore/reject such extraneous comments of course. >> >> >> >> Cheers, >> >> Bert >> >> >> >> >> >> Bert Gunter >> >> >> >> "The trouble with having an open mind is that people keep coming along >> >> and sticking things into it." >> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> >> >> >> On Thu, Jun 23, 2016 at 12:14 PM, David L Carlson <dcarl...@tamu.edu> >> >> wrote: >> >> > Good point. I did not think about factors. Also your example raises >> >> > another issue since column c is logical, but gets silently converted to >> >> > numeric. This would seem to get the job done assuming the conversion is >> >> > intended for numeric columns only: >> >> > >> >> >> test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3)) >> >> >> sapply(test, class) >> >> > a b c >> >> > "numeric" "factor" "logical" >> >> >> num <- sapply(test, is.numeric) >> >> >> test[, num][is.na(test[, num])] <- 0 >> >> >> test >> >> > a b c >> >> > 1 1 A NA >> >> > 2 0 b NA >> >> > 3 2 <NA> NA >> >> > >> >> > David C >> >> > >> >> > -----Original Message----- >> >> > From: Bert Gunter [mailto:bgunter.4...@gmail.com] >> >> > Sent: Thursday, June 23, 2016 1:48 PM >> >> > To: David L Carlson >> >> > Cc: Ivan Calandra; R Help >> >> > Subject: Re: [R] Subscripting problem with is.na() >> >> > >> >> > Not in general, David: >> >> > >> >> > e.g. >> >> > >> >> >> test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3)) >> >> > >> >> >> is.na(test) >> >> > a b c >> >> > [1,] FALSE FALSE TRUE >> >> > [2,] TRUE FALSE TRUE >> >> > [3,] FALSE TRUE TRUE >> >> > >> >> >> test[is.na(test)] >> >> > [1] NA NA NA NA NA >> >> > >> >> >> test[is.na(test)] <- 0 >> >> > Warning message: >> >> > In `[<-.factor`(`*tmp*`, thisvar, value = 0) : >> >> > invalid factor level, NA generated >> >> > >> >> >> test >> >> > a b c >> >> > 1 1 A 0 >> >> > 2 0 b 0 >> >> > 3 2 <NA> 0 >> >> > >> >> > >> >> > The problem is the default conversion to factors and the replacement >> >> > operation for factors. So: >> >> > >> >> >> test <- data.frame(a=c(1,NA,2), b = I(c("A","b",NA_character_)), c= >> >> >> rep(NA,3)) >> >> >> class(test$b) >> >> > [1] "AsIs" ## so NOT a factor >> >> > >> >> >> test[is.na(test)] <- 0 # now works as you describe >> >> >> test >> >> > a b c >> >> > 1 1 A 0 >> >> > 2 0 b 0 >> >> > 3 2 0 0 >> >> > >> >> > Of course the OP (and you) probably had a data frame of all numerics >> >> > in mind, so the problem doesn't arise. But I think one needs to make >> >> > the distinction and issue clear. >> >> > >> >> > Cheers, >> >> > Bert >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > Bert Gunter >> >> > >> >> > "The trouble with having an open mind is that people keep coming >> >> > along >> >> > and sticking things into it." >> >> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> > >> >> > >> >> > On Thu, Jun 23, 2016 at 8:46 AM, David L Carlson <dcarl...@tamu.edu> >> >> > wrote: >> >> >> The function is.na() returns a matrix when applied to a data.frame >> >> >> so you can easily convert all the NAs to 0's: >> >> >> >> >> >>> ds_test >> >> >> var1 var2 >> >> >> 1 1 1 >> >> >> 2 2 2 >> >> >> 3 3 3 >> >> >> 4 NA NA >> >> >> 5 5 5 >> >> >> 6 6 6 >> >> >> 7 7 7 >> >> >> 8 NA NA >> >> >> 9 9 9 >> >> >> 10 10 10 >> >> >>> is.na(ds_test) >> >> >> var1 var2 >> >> >> [1,] FALSE FALSE >> >> >> [2,] FALSE FALSE >> >> >> [3,] FALSE FALSE >> >> >> [4,] TRUE TRUE >> >> >> [5,] FALSE FALSE >> >> >> [6,] FALSE FALSE >> >> >> [7,] FALSE FALSE >> >> >> [8,] TRUE TRUE >> >> >> [9,] FALSE FALSE >> >> >> [10,] FALSE FALSE >> >> >>> ds_test[is.na(ds_test)] <- 0 >> >> >>> ds_test >> >> >> var1 var2 >> >> >> 1 1 1 >> >> >> 2 2 2 >> >> >> 3 3 3 >> >> >> 4 0 0 >> >> >> 5 5 5 >> >> >> 6 6 6 >> >> >> 7 7 7 >> >> >> 8 0 0 >> >> >> 9 9 9 >> >> >> 10 10 10 >> >> >> >> >> >> ------------------------------------- >> >> >> David L Carlson >> >> >> Department of Anthropology >> >> >> Texas A&M University >> >> >> College Station, TX 77840-4352 >> >> >> >> >> >> -----Original Message----- >> >> >> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ivan >> >> >> Calandra >> >> >> Sent: Thursday, June 23, 2016 10:14 AM >> >> >> To: R Help >> >> >> Subject: Re: [R] Subscripting problem with is.na() >> >> >> >> >> >> Thank you Bert for this clarification. It is indeed an important >> >> >> point. >> >> >> >> >> >> Ivan >> >> >> >> >> >> -- >> >> >> Ivan Calandra, PhD >> >> >> Scientific Mediator >> >> >> University of Reims Champagne-Ardenne >> >> >> GEGENAA - EA 3795 >> >> >> CREA - 2 esplanade Roland Garros >> >> >> 51100 Reims, France >> >> >> +33(0)3 26 77 36 89 >> >> >> ivan.calan...@univ-reims.fr >> >> >> -- >> >> >> https://www.researchgate.net/profile/Ivan_Calandra >> >> >> https://publons.com/author/705639/ >> >> >> >> >> >> Le 23/06/2016 à 17:06, Bert Gunter a écrit : >> >> >>> Sorry, Ivan, your statement is incorrect: >> >> >>> >> >> >>> "When you use a single bracket on a list with only one argument in >> >> >>> between, then R extracts "elements", i.e. columns in the case of a >> >> >>> data.frame. This explains your errors. " >> >> >>> >> >> >>> e.g. >> >> >>> >> >> >>>> ex <- data.frame(a = 1:3, b = letters[1:3]) >> >> >>>> a <- 1:3 >> >> >>>> identical(ex[1], a) >> >> >>> [1] FALSE >> >> >>> >> >> >>>> class(ex[1]) >> >> >>> [1] "data.frame" >> >> >>>> class(a) >> >> >>> [1] "integer" >> >> >>> >> >> >>> Compare: >> >> >>> >> >> >>>> identical(ex[[1]], a) >> >> >>> [1] TRUE >> >> >>> >> >> >>> Why? Single bracket extraction on a list results in a list; double >> >> >>> bracket extraction results in the element of the list ( a "column" >> >> >>> in >> >> >>> the case of a data frame, which is a specific kind of list). The >> >> >>> relevant sections of ?Extract are: >> >> >>> >> >> >>> "Indexing by [ is similar to atomic vectors and selects a **list** >> >> >>> of >> >> >>> the specified element(s). >> >> >>> >> >> >>> Both [[ and $ select a **single element of the list**. " >> >> >>> >> >> >>> >> >> >>> Hope this clarifies this often-confused issue. >> >> >>> >> >> >>> >> >> >>> Cheers, >> >> >>> Bert >> >> >>> Bert Gunter >> >> >>> >> >> >>> "The trouble with having an open mind is that people keep coming >> >> >>> along >> >> >>> and sticking things into it." >> >> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >>> >> >> >>> >> >> >>> On Thu, Jun 23, 2016 at 7:34 AM, Ivan Calandra >> >> >>> <ivan.calan...@univ-reims.fr> wrote: >> >> >>>> My statement "Using a single bracket '[' on a data.frame does the >> >> >>>> same as >> >> >>>> for matrices: you need to specify rows and columns" was not >> >> >>>> correct. >> >> >>>> >> >> >>>> >> >> >>>> When you use a single bracket on a list with only one argument in >> >> >>>> between, >> >> >>>> then R extracts "elements", i.e. columns in the case of a >> >> >>>> data.frame. This >> >> >>>> explains your errors. >> >> >>>> >> >> >>>> But it is possible to use a single bracket on a data.frame with 2 >> >> >>>> arguments >> >> >>>> (rows, columns) separated by a comma, as with matrices. This is >> >> >>>> the solution >> >> >>>> you received. >> >> >>>> >> >> >>>> Ivan >> >> >>>> >> >> >>>> >> >> >>>> -- >> >> >>>> Ivan Calandra, PhD >> >> >>>> Scientific Mediator >> >> >>>> University of Reims Champagne-Ardenne >> >> >>>> GEGENAA - EA 3795 >> >> >>>> CREA - 2 esplanade Roland Garros >> >> >>>> 51100 Reims, France >> >> >>>> +33(0)3 26 77 36 89 >> >> >>>> ivan.calan...@univ-reims.fr >> >> >>>> -- >> >> >>>> https://www.researchgate.net/profile/Ivan_Calandra >> >> >>>> https://publons.com/author/705639/ >> >> >>>> >> >> >>>> Le 23/06/2016 à 16:27, Ivan Calandra a écrit : >> >> >>>>> Dear Georg, >> >> >>>>> >> >> >>>>> You need to learn a bit more about the subsetting methods, >> >> >>>>> depending on >> >> >>>>> the object structure you're trying to subset. >> >> >>>>> >> >> >>>>> More specifically, when you run this: >> >> >>>>> ds_test[is.na(ds_test$var1)] >> >> >>>>> you get this error: "Error in `[.data.frame`(ds_test, >> >> >>>>> is.na(ds_test$var1)) >> >> >>>>> : undefined columns selected" >> >> >>>>> >> >> >>>>> This means that R does not understand which column you're trying >> >> >>>>> to >> >> >>>>> select. But you're actually trying to select rows. >> >> >>>>> >> >> >>>>> Using a single bracket '[' on a data.frame does the same as for >> >> >>>>> matrices: >> >> >>>>> you need to specify rows and columns, like this: >> >> >>>>> ds_test[is.na(ds_test$var1), ] ## notice the last comma >> >> >>>>> ds_test[is.na(ds_test$var1), ] <- 0 ## works on all columns >> >> >>>>> because you >> >> >>>>> didn't specify any after the comma >> >> >>>>> >> >> >>>>> If you want it only for "var1", then you need to specify the >> >> >>>>> column: >> >> >>>>> ds_test[is.na(ds_test$var1), "var1"] <- 0 >> >> >>>>> >> >> >>>>> It's the same problem with your 2nd and 4th tries (4th one has >> >> >>>>> other >> >> >>>>> problems). Your 3rd try does not change ds_test at all. >> >> >>>>> >> >> >>>>> HTH, >> >> >>>>> Ivan >> >> >>>>> >> >> >>>>> -- >> >> >>>>> Ivan Calandra, PhD >> >> >>>>> Scientific Mediator >> >> >>>>> University of Reims Champagne-Ardenne >> >> >>>>> GEGENAA - EA 3795 >> >> >>>>> CREA - 2 esplanade Roland Garros >> >> >>>>> 51100 Reims, France >> >> >>>>> +33(0)3 26 77 36 89 >> >> >>>>> ivan.calan...@univ-reims.fr >> >> >>>>> -- >> >> >>>>> https://www.researchgate.net/profile/Ivan_Calandra >> >> >>>>> https://publons.com/author/705639/ >> >> >>>>> >> >> >>>>> Le 23/06/2016 à 15:57, g.maub...@weinwolf.de a écrit : >> >> >>>>>> Hi All, >> >> >>>>>> >> >> >>>>>> I would like to recode my NAs to 0. Using a single vector >> >> >>>>>> everything is >> >> >>>>>> fine. >> >> >>>>>> >> >> >>>>>> But if I use a data.frame things go wrong: >> >> >>>>>> >> >> >>>>>> -- cut -- >> >> >>>>>> >> >> >>>>>> var1 <- c(1:3, NA, 5:7, NA, 9:10) >> >> >>>>>> var2 <- c(1:3, NA, 5:7, NA, 9:10) >> >> >>>>>> ds_test <- >> >> >>>>>> data.frame(var1, var2) >> >> >>>>>> >> >> >>>>>> test <- var1 >> >> >>>>>> test[is.na(test)] <- 0 >> >> >>>>>> test # NA recoded OK >> >> >>>>>> >> >> >>>>>> # First try >> >> >>>>>> ds_test[is.na(ds_test$var1)] <- 0 # duplicate subscripts WRONG >> >> >>>>>> >> >> >>>>>> # Second try >> >> >>>>>> ds_test[is.na("var1")] <- 0 >> >> >>>>>> ds_test$var1 # not recoded WRONG >> >> >>>>>> >> >> >>>>>> # Third try: to me the most intuitive approach >> >> >>>>>> is.na(ds_test["var1"]) <- 0 # attempt to select less than one >> >> >>>>>> element in >> >> >>>>>> integerOneIndex WRONG >> >> >>>>>> >> >> >>>>>> # Fourth try >> >> >>>>>> ds_test[is.na(var1)] <- 0 # duplicate subscripts for columns >> >> >>>>>> WRONG >> >> >>>>>> >> >> >>>>>> -- cut -- >> >> >>>>>> How can I do it correctly? >> >> >>>>>> >> >> >>>>>> Where could I have found something about it? >> >> >>>>>> >> >> >>>>>> Kind regards >> >> >>>>>> >> >> >>>>>> Georg >> >> >>>>>> >> >> >>>>>> ______________________________________________ >> >> >>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, >> >> >>>>>> see >> >> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >> >> >>>>>> PLEASE do read the posting guide >> >> >>>>>> http://www.R-project.org/posting-guide.html >> >> >>>>>> and provide commented, minimal, self-contained, reproducible >> >> >>>>>> code. >> >> >>>>>> >> >> >>>>> ______________________________________________ >> >> >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >> >> >>>>> PLEASE do read the posting guide >> >> >>>>> http://www.R-project.org/posting-guide.html >> >> >>>>> and provide commented, minimal, self-contained, reproducible >> >> >>>>> code. >> >> >>>>> >> >> >>>> ______________________________________________ >> >> >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> >>>> https://stat.ethz.ch/mailman/listinfo/r-help >> >> >>>> PLEASE do read the posting guide >> >> >>>> http://www.R-project.org/posting-guide.html >> >> >>>> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> >> >> ______________________________________________ >> >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> >> PLEASE do read the posting guide >> >> >> http://www.R-project.org/posting-guide.html >> >> >> and provide commented, minimal, self-contained, reproducible code. >> >> >> ______________________________________________ >> >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> >> PLEASE do read the posting guide >> >> >> http://www.R-project.org/posting-guide.html >> >> >> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> ______________________________________________ >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.