On Sun, Jan 31, 2010 at 5:05 PM, Sunny Srivastava <research.b...@gmail.com> wrote: > Dear R-Helpers, > I have a data.frame (df) and the head of data.frame looks like > > ProbeUID ControlType ProbeName GeneName SystematicName > 1665 1577 0 pSysX_50_22_1 pSysX_50 pSysX_50 > 5422 5147 0 pSysX_49_8_1 pSysX_49 pSysX_49 > 4042 3843 0 pSysX_51_18_1 pSysX_51 pSysX_51 > 3646 3466 0 sll1514_0_2 sll1514 sll1514 > 2946 2807 0 sll1514_0_1 sll1514 sll1514 > 624 582 0 pSysX_49_8_2 pSysX_49 pSysX_49 > > Description logFC AveExpr t P.Value adj.P.Val > 1665 Unknown 4.3887 9.5662 61.038 1.0938e-08 9.4449e-05 > 5422 Unknown -3.5251 6.9103 -35.908 1.7596e-07 3.5912e-04 > 4042 Unknown 2.5302 8.7497 35.112 1.9786e-07 3.5912e-04 > 3646 Unknown 2.3457 11.1678 33.962 2.3549e-07 3.5912e-04 > 2946 Unknown 2.3151 11.3153 32.689 2.8751e-07 3.5912e-04 > 624 Unknown -3.6256 6.8986 -31.777 3.3333e-07 3.5912e-04 > B > 1665 9.8342 > 5422 8.1650 > 4042 8.0758 > 3646 7.9408 > 2946 7.7822 > 624 7.6622 > > I want to "collapse" this data frame into a new data.frame so that the > df$GeneName contains no duplicate GeneNames (for eg: sll1514) AND the > df$logFC contains the average of df$logFC corresponding to these GeneNames > (which had duplicate genenames).
library(plyr) ddply(df, "GeneName", summarise, logFC = mean(logFC) Hadley -- http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.