Thanks for these various tips. Sarah, this is not a howework, but a simplified dataset speecificly for this question.
Laura . 2011/11/14 Dennis Murphy <djmu...@gmail.com> > Groupwise data summarization is a very common task, and it is worth > learning the various ways to do it in R. Josh showed you one way to > use aggregate() from the base package and Michael showed you one way > of using the plyr package to do the same; another way would be > > ddply(df, .(Patient, Region), summarise, max = max(Score), min = > min(Score)) > > to save on writing an explicit function. Similarly, if you have a > version of R >= 2.11.0, the aggregate() function now has a nice > formula interface, so Josh's code could also be written as > > aggregate(Score ~ Patient + Region, data = df, FUN = range) > > with a subsequent renaming of the variables as shown. > > Other packages that could perform this task with ease include the doBy > package, the data.table package, the remix package, the Hmisc package > and, if you are comfortable with SQL, the sqldf package. For relative > novices, the doBy package is a very nice place to start because it > comes with a well written vignette and the function names correspond > well with the tasks they perform (e.g., summaryBy(), transformBy()). > The plyr and data.table packages are more general and more powerful in > terms of the types of tasks to which each is suited. Unlike > aggregate() and doBy:::summaryBy(), these packages can process > multivariable functions. As noted above, if you have an SQL > background, sqldf operates on R data objects as though they were SQL > tables, which is advantageous in complex data extraction tasks. > Package remix is useful if you want to organize results into a tabular > form that is reminiscent of SAS. > > HTH, > Dennis > > On Mon, Nov 14, 2011 at 8:10 AM, B Laura <gm.spam2...@gmail.com> wrote: > > dear R-team > > > > I need to find the min, max values for each patient from dataset and keep > > the output of it as a dataframe with the following columns > > - Patient nr > > - Region (remains same per patient) > > - Min score > > - Max score > > > > > > Patient Region Score Time > > 1 1 X 19 28 > > 2 1 X 20 126 > > 3 1 X 22 100 > > 4 1 X 25 191 > > 5 2 Y 12 1 > > 6 2 Y 12 2 > > 7 2 Y 25 4 > > 8 2 Y 26 7 > > 9 3 X 6 1 > > 10 3 X 6 4 > > 11 3 X 21 31 > > 12 3 X 22 68 > > 13 3 X 23 31 > > 14 3 X 24 38 > > 15 3 X 21 15 > > 16 3 X 22 24 > > 17 3 X 23 15 > > 18 3 X 24 243 > > 19 3 X 25 77 > > 20 4 Y 6 5 > > 21 4 Y 22 28 > > 22 4 Y 23 75 > > 23 4 Y 24 19 > > 24 5 Y 23 3 > > 25 5 Y 24 1 > > 26 5 Y 23 33 > > 27 5 Y 24 13 > > 28 5 Y 25 42 > > 29 5 Y 26 21 > > 30 5 Y 27 4 > > 31 6 Y 24 4 > > 32 6 Y 32 8 > > > > So far I could find the min and max values for each patient, but the > output > > of it is not (yet) what I need. > > > >> Patient.nr = unique(Patient) > >> aggregate(Score, list(Patient), max) > > Group.1 x > > 1 1 25 > > 2 2 26 > > 3 3 25 > > 4 4 24 > > 5 5 27 > > 6 6 32 > > > >> aggregate(Score, list(Patient), min) > > Group.1 x > > 1 1 19 > > 2 2 12 > > 3 3 6 > > 4 4 6 > > 5 5 23 > > 6 6 24 > > I would like to do same but writing this new information (min, max > values) > > in a dataframe with following columns > > - Patient nr > > - Region (remains same per patient) > > - Min score > > - Max score > > > > Can anybody help me with this? > > > > Thanks > > Laura > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.