FWIW: Yes, thanks for noting that. My own preference is to always propagate NA's and manually decide how to deal with them, but others may disagree.
Best, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sun, Aug 8, 2021 at 11:30 PM PIKAL Petr <petr.pi...@precheza.cz> wrote: > > Hi Bert > > Yes, in this case which is not necessary. But in case NAs are involved > sometimes logical indexing is not a best choice as NA propagates to the > result, which may be not wanted. > > x <- 1:10 > x[c(2,5)] <- NA > y<- letters[1:10] > y[x<5] > [1] "a" NA "c" "d" NA > y[which(x<5)] > [1] "a" "c" "d" > dat <- data.frame(x,y) > dat[x<5,] > x y > 1 1 a > NA NA <NA> > 3 3 c > 4 4 d > NA.1 NA <NA> > > > dat[which(x<5),] > x y > 1 1 a > 3 3 c > 4 4 d > > Both results are OK, but one has to consider this NA value propagation. > > Cheers > Petr > > From: Bert Gunter <bgunter.4...@gmail.com> > Sent: Friday, August 6, 2021 1:29 PM > To: PIKAL Petr <petr.pi...@precheza.cz> > Cc: Luigi Marongiu <marongiu.lu...@gmail.com>; r-help <r-help@r-project.org> > Subject: Re: [R] Sanity check in loading large dataframe > > ... but remove the which() and use logical indexing ... ;-) > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Fri, Aug 6, 2021 at 12:57 AM PIKAL Petr <mailto:petr.pi...@precheza.cz> > wrote: > Hi > > You already got answer from Avi. I often use dim(data) to inspect how many > rows/columns I have. > After that I check if some columns contain all or many NA values. > > colSums(http://is.na(data)) > keep <- which(colSums(http://is.na(data))<nnn) > cleaned.data <- data[, keep] > > Cheers > Petr > > > > -----Original Message----- > > From: R-help <mailto:r-help-boun...@r-project.org> On Behalf Of Luigi > > Marongiu > > Sent: Friday, August 6, 2021 7:34 AM > > To: Duncan Murdoch <mailto:murdoch.dun...@gmail.com> > > Cc: r-help <mailto:r-help@r-project.org> > > Subject: Re: [R] Sanity check in loading large dataframe > > > > Ok, so nothing to worry about. Yet, are there other checks I can > implement? > > Thank you > > > > On Thu, 5 Aug 2021, 15:40 Duncan Murdoch, <mailto:murdoch.dun...@gmail.com> > > wrote: > > > > > On 05/08/2021 9:16 a.m., Luigi Marongiu wrote: > > > > Hello, > > > > I am using a large spreadsheet (over 600 variables). > > > > I tried `str` to check the dimensions of the spreadsheet and I got > > > > ``` >> (str(df)) > 'data.frame': 302 obs. of 626 variables: > > > > $ record_id : int 1 1 1 1 1 1 1 1 1 1 ... > > > > .... > > > > $ v1_medicamento___aceta : int 1 NA NA NA NA NA NA NA NA NA ... > > > > [list output truncated] > > > > NULL > > > > ``` > > > > I understand that `[list output truncated]` means that there are > > > more > variables than those allowed by str to be displayed as rows. > > > Thus I > increased the row's output with: > > > > ``` > > > > > > > >> (str(df, list.len=1000)) > > > > 'data.frame': 302 obs. of 626 variables: > > > > $ record_id : int 1 1 1 1 1 1 1 1 1 1 ... > > > > ... > > > > NULL > > > > ``` > > > > > > > > Does `NULL` mean that some of the variables are not closed? > > > (perhaps a > missing comma somewhere) > Is there a way to check the > > > sanity of the data and avoid that some > separator is not in the > > > right place? > > > > Thank you > > > > > > The NULL is the value returned by str(). Normally it is not printed, > > > but when you wrap str in parens as (str(df, list.len=1000)), that > > > forces the value to print. > > > > > > str() is unusual in R functions in that it prints to the console as it > > > runs and returns nothing. Many other functions construct a value > > > which is only displayed if you print it, but something like > > > > > > x <- str(df, list.len=1000) > > > > > > will print the same as if there was no assignment, and then assign > > > NULL to x. > > > > > > Duncan Murdoch > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > mailto:R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > > guide.html > > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ > mailto:R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.