Oh sorry -- my mistake with ave() -- I only checked the first row.... drop = F is an optional argument to the function "[" which tells it to return one of what it began with, rather than simplifying.
E.g., X = matrix(1:9, 3) is.matrix(X) TRUE is.matrix(X[,2:3]) TRUE is.matrix(X[,3]) FALSE # Just a regular vector is.matrix(X[,3,drop = F]) TRUE Aggregate wants a list in that second slot and data frames are secretly also lists, so keeping it as a data frame gives the desired list. Michael On Tue, Nov 15, 2011 at 7:07 AM, Rob Griffin <robgriffin...@hotmail.com> wrote: > Thanks Michael, > That second (aggregate) option worked perfectly - the first (cbind) > generated averages for each row between the columns. (rather than between > rows for each column). > I came so close with aggregate yesterday - it is only slightly different to > one my attempts (of admittedly very many attempts) to solve it so feels good > that I was going along the right lines at some point! > > Could you possibly explain what this drop=F term is doing? > > Rob > (A very grateful and relieved phd student). > > (also if anyone fancies helping me with another problem I posted yesterday: > http://r.789695.n4.nabble.com/correlations-between-columns-for-each-row-td4039193.html > ) > > > -----Original Message----- From: R. Michael Weylandt > Sent: Tuesday, November 15, 2011 12:46 PM > To: robgriffin247 > Cc: r-help@r-project.org > Subject: Re: [R] averaging between rows with repeated data > > Good morning Rob, > > First off, thank you for providing a reproducible example. This is one > of those little tasks that R is pretty great at, but there exist >> >> \infty ways to do so and it can be a little overwhelming for the > > beginner: here's one with the base function ave(): > > cbind(ave(example[,2:4], example[,5]), id = example[,5]) > > This splits example according to the fifth column (id) and averages > the other values: we then stick another copy of the id back on the end > and are good to go. > > The base function aggregate can do something similar: > > aggregate(example[,2:4], by = example[,5, drop = F], mean) > > Note that you need the little-publicized but super useful drop = F > command to make this one work. > > There are other ways to do this with the plyr or doBy packages as > well, but this should get you started. > > Hope it helps, > > Michael > > On Tue, Nov 15, 2011 at 5:52 AM, robgriffin247 > <robgriffin...@hotmail.com> wrote: >> >> *The situation (or an example at least!)* >> >> example<-data.frame(rep(letters[1:10])) >> colnames(example)[1]<-("Letters") >> example$numb1<-rnorm(10,1,1) >> example$numb2<-rnorm(10,1,1) >> example$numb3<-rnorm(10,1,1) >> >> example$id<-c("CG234","CG232","CG441","CG128","CG125","CG182","CG232","CG441","CG232","CG125") >> >> *this produces something like this:* >> Letters numb1 numb2 numb3 id >> 1 a 0.8139130 -0.9775570 -0.002996244 CG234 >> 2 b 0.8268700 0.4980661 1.647717998 CG232 >> 3 c 0.2384088 1.0249684 0.120663273 CG441 >> 4 d 0.8215922 0.5686534 1.591208307 CG128 >> 5 e 0.7865918 0.5411476 0.838300185 CG125 >> 6 f 2.2385522 1.2668070 1.268005020 CG182 >> 7 g 0.7403965 -0.6224205 1.374641549 CG232 >> 8 h 0.2526634 1.0282978 -0.110449844 CG441 >> 9 i 1.9333444 1.6667486 2.937252363 CG232 >> 10 j 1.6996701 0.5964623 1.967870617 CG125 >> >> *The Problem:* >> Some of these id's are repeated, I want to average the values for those >> rows >> within each column but obviously they have different numbers in the >> numbers >> column, and they also have different letters in the letters column, the >> letters are not necessary for my analysis, only the duplicated id's and >> the >> numb columns are important >> >> I also need to keep the existing dataframe so would like to build a new >> dataframe that averages the repeated values and keeps their id - my actual >> dataset is much more complex (271*13890) - but the solution to this can be >> expanded out to my main data set because there is just more columns of >> numbers and still only one alphanumeric id to keep in my example data, id >> CG232 occurs 3 times, CG441 & CG125 occur twice, everthing else once so >> the >> new dataframe (from this example) there would be 3 number columns (numb1, >> numb2, numb3) and an id the numb column values would be the averages of >> the >> rows which had the same id >> >> so for example the new dataframe would contain an entry for CG125 which >> would be something like this: >> >> numb1 numb2 numb3 id >> 1.2431 0.5688 1.403 CG125 >> >> Just as a thought, all of the IDs start with CG so could I use then grep >> (?) >> to delete CG and replace it with 0, that way duplicated ids could be >> averaged as a number (they would be the same) but I still don’t know how >> to >> produce the new dataframe with the averaged rows in it... >> >> I hope this is clear enough! email me if you need further detail or even >> better, if you have a solution!! >> also sorry to be posting my second question in under 24hours but I seem to >> have become more than a little stuck – I was making such good progress >> with >> R! >> >> Rob >> >> (also I'm sorry if this appears more than once on the mailing list - I'm >> having some network & windows live issues so I'm not convinced previous >> attempts to send this have worked, but have no way of telling if they are >> just milling around in the internet somewhere as we speak and will decide >> to >> come out of hiding later!) >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/averaging-between-rows-with-repeated-data-tp4042513p4042513.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.