On Fri, Jun 25, 2010 at 5:24 PM, Joris Meys <jorism...@gmail.com> wrote: > Just want to add that if you want to clean out the NA rows in a matrix > or data frame, take a look at ?complete.cases. Can be handy to use > with big datasets. I got curious, so I just ran the codes given here > on a big dataset, before and after removing NA rows. I have to be > honest, this is surely an illustration of the power of rowMeans. I'm > amazed myself.
I was too...the documentation (?rowMeans) wasn't joking: "These functions are equivalent to use of 'apply' with 'FUN = mean' or 'FUN = sum' with appropriate margins, but are a lot faster." > > DF <- data.frame( > A=rep(DF$A,10000), > B=rep(DF$B,10000) > ) > >> system.time(apply(DF,1,mean,na.rm=TRUE)) > user system elapsed > 13.26 0.06 13.46 > >> system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1)) > user system elapsed > 0.03 0.00 0.03 > >> system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean, > + na.rm=TRUE)[,-1])) > + ) > > Timing stopped at: 227.84 1.03 249.31 -- I got impatient and pressed the > escape > >> DF <- DF[complete.cases(DF),] > >> system.time(apply(DF,1,mean,na.rm=TRUE)) > user system elapsed > 0.39 0.00 0.39 > >> system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1)) > user system elapsed > 0.01 0.00 0.02 > >> system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean, > + na.rm=TRUE)[,-1])) > + ) > user system elapsed > 10.01 0.07 13.40 > > Cheers > Joris > > > On Sat, Jun 26, 2010 at 1:08 AM, emorway <emor...@engr.colostate.edu> wrote: >> >> Forum, >> >> Using the following data: >> >> DF<-read.table(textConnection("A B >> 22.60 NA >> NA NA >> NA NA >> NA NA >> NA NA >> NA NA >> NA NA >> NA NA >> 102.00 NA >> 19.20 NA >> 19.20 NA >> NA NA >> NA NA >> NA NA >> 11.80 NA >> 7.62 NA >> NA NA >> NA NA >> NA NA >> NA NA >> NA NA >> 75.00 NA >> NA NA >> 18.30 18.2 >> NA NA >> NA NA >> 8.44 NA >> 18.00 NA >> NA NA >> 12.90 NA"),header=T) >> closeAllConnections() >> >> The second column is a duplicate reading of the first column, and when two >> values are available, I would like to average column 1 and 2 (example code >> below). But if there is only one reading, I would like to retain it, but I >> haven't found a good way to exclude NA's using the following code: >> >> t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1])) >> >> Currently, row 24 is the only row with a returned value. I'd like the >> result to return column "A" if it is the only available value, and average >> where possible. Of course, if both columns are NA, NA is the only possible >> result. >> >> The result I'm after would look like this (row 24 is an avg): >> >> 22.60 >> NA >> NA >> NA >> NA >> NA >> NA >> NA >> 102.00 >> 19.20 >> 19.20 >> NA >> NA >> NA >> 11.80 >> 7.62 >> NA >> NA >> NA >> NA >> NA >> 75.00 >> NA >> 18.25 >> NA >> NA >> 8.44 >> 18.00 >> NA >> 12.90 >> >> This is a small example from a much larger data frame, so if you're >> wondering what the deal is with list(), that will come into play for the >> larger problem I'm trying to solve. >> >> Respectfully, >> Eric >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Joris Meys > Statistical consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Applied mathematics, biometrics and process control > > tel : +32 9 264 59 87 > joris.m...@ugent.be > ------------------------------- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.