See insert below. -- Don MacQueen
Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 6/24/16, 12:14 AM, "R-help on behalf of g.maub...@gmx.de" <r-help-boun...@r-project.org on behalf of g.maub...@gmx.de> wrote: >Hi Bert, > >many thanks for all your help and your comments. I learn at lot this way. > >My question was about is.na() at the first sight but the actual task >looks like this: > >I have two variables in my customer data that signal if the customer >accout was closed by master data management or by sales. Say these >variables are closed_mdm and closed_sls. They contain NA if the customer >account is still open or a closing code from "01" to "08" if the customer >account was closed and why. > >For my analysis I need a variable that combines the two variables >closed_mdm and closed_sls to set a filter easily on those who are closed >not matter what the reason was nor who closed the account. Given that description, this would seem to do the job: cust.id <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20) closed.mdm <- c("01", NA, NA, NA, "08", "07", NA, NA, "05", NA, NA, NA, "04", NA, NA, NA, NA, NA, NA, NA) closed.sls <- c(NA, "08", NA, NA, "08", "07", NA, NA, NA, NA, "03", NA, NA, NA, "05", NA, NA, NA, NA, NA) df <- data.frame(cust.id, closed.mdm, closed.sls, stringsAsFactors=FALSE) df$opcl <- ifelse( is.na(closed.mdm) & is.na(closed.sls) , 'open','closed') Then use the opcl column to filter, e.g., subset(df, opcl=='open') If you want to operate directly on one of the 'closed' column, perhaps these examples will help ## does not work due to the NAs df[ df$closed.sls == '08',] ## workd subset(df, closed.sls=='08') ## works df[ !is.na(df$closed.sls) & df$closed.sls == '08',] > >As I always encounter problems when dealing with ifelse statements and NA >I decided to merge these two variables to one variable containing 0 = not >closed and 1 = closed. In my context this seems to be - at least to me - >a reasonable approach. > >Replacement of missing values and merging the variables is the easiest >way for me. > >-- cut -- > >cust_id <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, >18, 19, 20) >closed_mdm <- c("01", NA, NA, NA, "08", "07", NA, NA, "05", NA, NA, NA, >"04", NA, NA, NA, NA, NA, NA, NA) >closed_sls <- c(NA, "08", NA, NA, "08", "07", NA, NA, NA, NA, "03", NA, >NA, NA, "05", NA, NA, NA, NA, NA) > ># 1st try >ds_temp1 <- data.frame(cust_id, closed_mdm, closed_sls) >ds_temp1 > >ds_temp1$closed <- closed_mdm | closed_sls # WRONG > ># 2nd try >closed_mdm_fac1 <- as.factor(closed_mdm) >closed_sls_fac1 <- as.factor(closed_sls) > >ds_temp2 <- data.frame(cust_id, closed_mdm_fac1, closed_sls_fac1) >ds_temp2 > >ds_temp2$closed <- ds_temp$closed_mdm_fac1 | ds_temp$closed_sls_fac1 # >WRONG > ># 3rd try >closed_mdm_num1 <- as.numeric(closed_mdm) # OK >closed_sls_num1 <- as.numeric(closed_sls) # OK > >ds_temp3 <- data.frame(cust_id, closed_mdm_num1, closed_sls_num1) >ds_temp3 > >ds_temp3$closed <- ds_temp$closed_mdm_num1 | ds_temp$closed_sls_num1 # >WRONG > ># 4th try >ds_temp4 <- ds_temp3 >ds_temp4 > ># Does not run due to not allowed NA in subscripts >ds_temp4[is.na(ds_temp4$closed_mdm_num1), ds_temp4$closed_mdm_num1] <- 0 >ds_temp4[is.na(ds_temp4$closed_sls_num1), ds_temp4$closed_sls_num1] <- 0 > ># 5th try >ds_temp4$closed_mdm_num1 <- ifelse(is.na(ds_temp4$closed_mdm_num1), 1, 0) >ds_temp4$closed_sls_num1 <- ifelse(is.na(ds_temp4$closed_sls_num1), 1, 0) >ds_temp4 > >ds_temp4$closed <- ifelse(ds_temp4$closed_mdm_num1 == 1 | >ds_temp4$closed_sls_num1 == 1, 1, 0) >ds_temp4 > >-- cut -- > >Is there a better way to do it? > >Kind regards > >Georg > > >> Gesendet: Donnerstag, 23. Juni 2016 um 23:55 Uhr >> Von: "Bert Gunter" <bgunter.4...@gmail.com> >> An: "David L Carlson" <dcarl...@tamu.edu> >> Cc: "R Help" <r-help@r-project.org> >> Betreff: Re: [R] Subscripting problem with is.na() >> >> ... actually, FWIW, I would say that this little discussion mostly >> demonstrates why the OP's request is probably not a good idea in the >> first place. Usually, NA's should be left as NA's to be dealt with >> properly by R and packages. In biological measurements, for example, >> NA's often mean "below the ability to reliably measure." Biologists >> with whom I've worked over many years often want to convert these to 0 >> or omit the cases, both of which lead to biased estimates and/or >> underestimates of variability and excess claims of "statistical >> significance" (for those who belong to this religious persuasion). One >> should never say never, but I suspect that there are relatively few >> circumstances where the conversion the OP requested is actually wise. >> >> Feel free to ignore/reject such extraneous comments of course. >> >> Cheers, >> Bert >> >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Thu, Jun 23, 2016 at 12:14 PM, David L Carlson <dcarl...@tamu.edu> >>wrote: >> > Good point. I did not think about factors. Also your example raises >>another issue since column c is logical, but gets silently converted to >>numeric. This would seem to get the job done assuming the conversion is >>intended for numeric columns only: >> > >> >> test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3)) >> >> sapply(test, class) >> > a b c >> > "numeric" "factor" "logical" >> >> num <- sapply(test, is.numeric) >> >> test[, num][is.na(test[, num])] <- 0 >> >> test >> > a b c >> > 1 1 A NA >> > 2 0 b NA >> > 3 2 <NA> NA >> > >> > David C >> > >> > -----Original Message----- >> > From: Bert Gunter [mailto:bgunter.4...@gmail.com] >> > Sent: Thursday, June 23, 2016 1:48 PM >> > To: David L Carlson >> > Cc: Ivan Calandra; R Help >> > Subject: Re: [R] Subscripting problem with is.na() >> > >> > Not in general, David: >> > >> > e.g. >> > >> >> test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3)) >> > >> >> is.na(test) >> > a b c >> > [1,] FALSE FALSE TRUE >> > [2,] TRUE FALSE TRUE >> > [3,] FALSE TRUE TRUE >> > >> >> test[is.na(test)] >> > [1] NA NA NA NA NA >> > >> >> test[is.na(test)] <- 0 >> > Warning message: >> > In `[<-.factor`(`*tmp*`, thisvar, value = 0) : >> > invalid factor level, NA generated >> > >> >> test >> > a b c >> > 1 1 A 0 >> > 2 0 b 0 >> > 3 2 <NA> 0 >> > >> > >> > The problem is the default conversion to factors and the replacement >> > operation for factors. So: >> > >> >> test <- data.frame(a=c(1,NA,2), b = I(c("A","b",NA_character_)), c= >>rep(NA,3)) >> >> class(test$b) >> > [1] "AsIs" ## so NOT a factor >> > >> >> test[is.na(test)] <- 0 # now works as you describe >> >> test >> > a b c >> > 1 1 A 0 >> > 2 0 b 0 >> > 3 2 0 0 >> > >> > Of course the OP (and you) probably had a data frame of all numerics >> > in mind, so the problem doesn't arise. But I think one needs to make >> > the distinction and issue clear. >> > >> > Cheers, >> > Bert >> > >> > >> > >> > >> > >> > Bert Gunter >> > >> > "The trouble with having an open mind is that people keep coming along >> > and sticking things into it." >> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> > >> > >> > On Thu, Jun 23, 2016 at 8:46 AM, David L Carlson <dcarl...@tamu.edu> >>wrote: >> >> The function is.na() returns a matrix when applied to a data.frame >>so you can easily convert all the NAs to 0's: >> >> >> >>> ds_test >> >> var1 var2 >> >> 1 1 1 >> >> 2 2 2 >> >> 3 3 3 >> >> 4 NA NA >> >> 5 5 5 >> >> 6 6 6 >> >> 7 7 7 >> >> 8 NA NA >> >> 9 9 9 >> >> 10 10 10 >> >>> is.na(ds_test) >> >> var1 var2 >> >> [1,] FALSE FALSE >> >> [2,] FALSE FALSE >> >> [3,] FALSE FALSE >> >> [4,] TRUE TRUE >> >> [5,] FALSE FALSE >> >> [6,] FALSE FALSE >> >> [7,] FALSE FALSE >> >> [8,] TRUE TRUE >> >> [9,] FALSE FALSE >> >> [10,] FALSE FALSE >> >>> ds_test[is.na(ds_test)] <- 0 >> >>> ds_test >> >> var1 var2 >> >> 1 1 1 >> >> 2 2 2 >> >> 3 3 3 >> >> 4 0 0 >> >> 5 5 5 >> >> 6 6 6 >> >> 7 7 7 >> >> 8 0 0 >> >> 9 9 9 >> >> 10 10 10 >> >> >> >> ------------------------------------- >> >> David L Carlson >> >> Department of Anthropology >> >> Texas A&M University >> >> College Station, TX 77840-4352 >> >> >> >> -----Original Message----- >> >> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ivan >>Calandra >> >> Sent: Thursday, June 23, 2016 10:14 AM >> >> To: R Help >> >> Subject: Re: [R] Subscripting problem with is.na() >> >> >> >> Thank you Bert for this clarification. It is indeed an important >>point. >> >> >> >> Ivan >> >> >> >> -- >> >> Ivan Calandra, PhD >> >> Scientific Mediator >> >> University of Reims Champagne-Ardenne >> >> GEGENAA - EA 3795 >> >> CREA - 2 esplanade Roland Garros >> >> 51100 Reims, France >> >> +33(0)3 26 77 36 89 >> >> ivan.calan...@univ-reims.fr >> >> -- >> >> https://www.researchgate.net/profile/Ivan_Calandra >> >> https://publons.com/author/705639/ >> >> >> >> Le 23/06/2016 à 17:06, Bert Gunter a écrit : >> >>> Sorry, Ivan, your statement is incorrect: >> >>> >> >>> "When you use a single bracket on a list with only one argument in >> >>> between, then R extracts "elements", i.e. columns in the case of a >> >>> data.frame. This explains your errors. " >> >>> >> >>> e.g. >> >>> >> >>>> ex <- data.frame(a = 1:3, b = letters[1:3]) >> >>>> a <- 1:3 >> >>>> identical(ex[1], a) >> >>> [1] FALSE >> >>> >> >>>> class(ex[1]) >> >>> [1] "data.frame" >> >>>> class(a) >> >>> [1] "integer" >> >>> >> >>> Compare: >> >>> >> >>>> identical(ex[[1]], a) >> >>> [1] TRUE >> >>> >> >>> Why? Single bracket extraction on a list results in a list; double >> >>> bracket extraction results in the element of the list ( a "column" >>in >> >>> the case of a data frame, which is a specific kind of list). The >> >>> relevant sections of ?Extract are: >> >>> >> >>> "Indexing by [ is similar to atomic vectors and selects a **list** >>of >> >>> the specified element(s). >> >>> >> >>> Both [[ and $ select a **single element of the list**. " >> >>> >> >>> >> >>> Hope this clarifies this often-confused issue. >> >>> >> >>> >> >>> Cheers, >> >>> Bert >> >>> Bert Gunter >> >>> >> >>> "The trouble with having an open mind is that people keep coming >>along >> >>> and sticking things into it." >> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >>> >> >>> >> >>> On Thu, Jun 23, 2016 at 7:34 AM, Ivan Calandra >> >>> <ivan.calan...@univ-reims.fr> wrote: >> >>>> My statement "Using a single bracket '[' on a data.frame does the >>same as >> >>>> for matrices: you need to specify rows and columns" was not >>correct. >> >>>> >> >>>> >> >>>> When you use a single bracket on a list with only one argument in >>between, >> >>>> then R extracts "elements", i.e. columns in the case of a >>data.frame. This >> >>>> explains your errors. >> >>>> >> >>>> But it is possible to use a single bracket on a data.frame with 2 >>arguments >> >>>> (rows, columns) separated by a comma, as with matrices. This is >>the solution >> >>>> you received. >> >>>> >> >>>> Ivan >> >>>> >> >>>> >> >>>> -- >> >>>> Ivan Calandra, PhD >> >>>> Scientific Mediator >> >>>> University of Reims Champagne-Ardenne >> >>>> GEGENAA - EA 3795 >> >>>> CREA - 2 esplanade Roland Garros >> >>>> 51100 Reims, France >> >>>> +33(0)3 26 77 36 89 >> >>>> ivan.calan...@univ-reims.fr >> >>>> -- >> >>>> https://www.researchgate.net/profile/Ivan_Calandra >> >>>> https://publons.com/author/705639/ >> >>>> >> >>>> Le 23/06/2016 à 16:27, Ivan Calandra a écrit : >> >>>>> Dear Georg, >> >>>>> >> >>>>> You need to learn a bit more about the subsetting methods, >>depending on >> >>>>> the object structure you're trying to subset. >> >>>>> >> >>>>> More specifically, when you run this: ds_test[is.na(ds_test$var1)] >> >>>>> you get this error: "Error in `[.data.frame`(ds_test, >>is.na(ds_test$var1)) >> >>>>> : undefined columns selected" >> >>>>> >> >>>>> This means that R does not understand which column you're trying >>to >> >>>>> select. But you're actually trying to select rows. >> >>>>> >> >>>>> Using a single bracket '[' on a data.frame does the same as for >>matrices: >> >>>>> you need to specify rows and columns, like this: >> >>>>> ds_test[is.na(ds_test$var1), ] ## notice the last comma >> >>>>> ds_test[is.na(ds_test$var1), ] <- 0 ## works on all columns >>because you >> >>>>> didn't specify any after the comma >> >>>>> >> >>>>> If you want it only for "var1", then you need to specify the >>column: >> >>>>> ds_test[is.na(ds_test$var1), "var1"] <- 0 >> >>>>> >> >>>>> It's the same problem with your 2nd and 4th tries (4th one has >>other >> >>>>> problems). Your 3rd try does not change ds_test at all. >> >>>>> >> >>>>> HTH, >> >>>>> Ivan >> >>>>> >> >>>>> -- >> >>>>> Ivan Calandra, PhD >> >>>>> Scientific Mediator >> >>>>> University of Reims Champagne-Ardenne >> >>>>> GEGENAA - EA 3795 >> >>>>> CREA - 2 esplanade Roland Garros >> >>>>> 51100 Reims, France >> >>>>> +33(0)3 26 77 36 89 >> >>>>> ivan.calan...@univ-reims.fr >> >>>>> -- >> >>>>> https://www.researchgate.net/profile/Ivan_Calandra >> >>>>> https://publons.com/author/705639/ >> >>>>> >> >>>>> Le 23/06/2016 à 15:57, g.maub...@weinwolf.de a écrit : >> >>>>>> Hi All, >> >>>>>> >> >>>>>> I would like to recode my NAs to 0. Using a single vector >>everything is >> >>>>>> fine. >> >>>>>> >> >>>>>> But if I use a data.frame things go wrong: >> >>>>>> >> >>>>>> -- cut -- >> >>>>>> >> >>>>>> var1 <- c(1:3, NA, 5:7, NA, 9:10) >> >>>>>> var2 <- c(1:3, NA, 5:7, NA, 9:10) >> >>>>>> ds_test <- >> >>>>>> data.frame(var1, var2) >> >>>>>> >> >>>>>> test <- var1 >> >>>>>> test[is.na(test)] <- 0 >> >>>>>> test # NA recoded OK >> >>>>>> >> >>>>>> # First try >> >>>>>> ds_test[is.na(ds_test$var1)] <- 0 # duplicate subscripts WRONG >> >>>>>> >> >>>>>> # Second try >> >>>>>> ds_test[is.na("var1")] <- 0 >> >>>>>> ds_test$var1 # not recoded WRONG >> >>>>>> >> >>>>>> # Third try: to me the most intuitive approach >> >>>>>> is.na(ds_test["var1"]) <- 0 # attempt to select less than one >>element in >> >>>>>> integerOneIndex WRONG >> >>>>>> >> >>>>>> # Fourth try >> >>>>>> ds_test[is.na(var1)] <- 0 # duplicate subscripts for columns >>WRONG >> >>>>>> >> >>>>>> -- cut -- >> >>>>>> How can I do it correctly? >> >>>>>> >> >>>>>> Where could I have found something about it? >> >>>>>> >> >>>>>> Kind regards >> >>>>>> >> >>>>>> Georg >> >>>>>> >> >>>>>> ______________________________________________ >> >>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >> >>>>>> PLEASE do read the posting guide >> >>>>>> http://www.R-project.org/posting-guide.html >> >>>>>> and provide commented, minimal, self-contained, reproducible >>code. >> >>>>>> >> >>>>> ______________________________________________ >> >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >> >>>>> PLEASE do read the posting guide >> >>>>> http://www.R-project.org/posting-guide.html >> >>>>> and provide commented, minimal, self-contained, reproducible code. >> >>>>> >> >>>> ______________________________________________ >> >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >>>> https://stat.ethz.ch/mailman/listinfo/r-help >> >>>> PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >> >>>> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> ______________________________________________ >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >______________________________________________ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.