Yes, that's exactly it!
Many thanks, it all comes back to me now! It's the darn do.call that I can
never remember somehow. I know I've reinvented this wheel several times --
you'd think I'd learn. Sigh.
Again, my thanks!
Regards,
Mike
On Nov 7, 2007 2:35 PM, jim holtman <[EMAIL PROTECTED]> wrote:
> Is this closer to what you would like?
>
> > x <- textConnection(" hostName user sys idle date time
> + 10142 fred 0.4 0.5 98.0 2007-11-01 02:02:18
> + 16886 barney 0.5 0.2 94.6 2007-10-25 19:12:12
> + 8795 fred 0.0 0.1 99.8 2007-10-30 05:08:22
> + 5261 fred 0.1 0.2 99.7 2007-10-25 07:20:32
> + 12427 barney 0.1 0.2 93.2 2007-10-19 14:34:10
> + 18067 barney 0.1 0.2 99.4 2007-10-27 10:34:08
> + 973 fred 0.0 0.2 99.8 2007-10-19 08:24:22
> + 5426 fred 0.2 0.3 99.5 2007-10-25 12:50:33
> + 7067 fred 0.1 0.2 99.4 2007-10-27 19:32:27
> + 13159 barney 0.1 0.4 84.3 2007-10-20 14:58:11
> + 17481 barney 1.2 2.0 92.6 2007-10-26 15:02:11
> + 21632 barney 0.1 0.1 99.6 2007-11-01 09:24:09
> + 206 fred 19.4 4.8 53.7 2007-10-18 06:50:34
> + 18151 barney 0.1 0.2 94.9 2007-10-27 13:22:09
> + 10662 fred 0.1 0.2 99.6 2007-11-01 19:22:27
> + 10376 fred 0.0 0.2 99.7 2007-11-01 09:50:24
> + 3630 fred 43.7 7.0 33.0 2007-10-23 00:58:27
> + 1118 fred 0.6 0.4 98.9 2007-10-19 13:14:23
> + 5122 fred 0.1 0.2 99.6 2007-10-25 02:42:21
> + 22117 barney 0.0 0.2 99.4 2007-11-02 01:34:04")
> > x.in <- read.table(x, header=TRUE, as.is=TRUE)
> > x.in$hour <- sapply(strsplit(x.in$time, ":"), '[', 1) # pick off the
> hour
> > x.by <- by(x.in, list(x.in$hour, x.in$hostName), function(.host){
> + data.frame(hostName=.host$hostName[1], hour=.host$hour[1],
> + user.mean=mean(.host$user),
> + sys.mean=mean(.host$sys),
> + idle.mean=mean(.host$idle),
> + user.max=max(.host$user),
> + sys.max=max(.host$sys),
> + idle.max=max(.host$idle))
> + })
> > do.call('rbind', x.by)
> hostName hour user.mean sys.mean idle.mean user.max sys.max idle.max
> 1 barney 01 0.00 0.20 99.40 0.0 0.2 99.4
> 2 barney 09 0.10 0.10 99.60 0.1 0.1 99.6
> 3 barney 10 0.10 0.20 99.40 0.1 0.2 99.4
> 4 barney 13 0.10 0.20 94.90 0.1 0.2 94.9
> 5 barney 14 0.10 0.30 88.75 0.1 0.4 93.2
> 6 barney 15 1.20 2.00 92.60 1.2 2.0 92.6
> 7 barney 19 0.50 0.20 94.60 0.5 0.2 94.6
> 8 fred 00 43.70 7.00 33.00 43.7 7.0 33.0
> 9 fred 02 0.25 0.35 98.80 0.4 0.5 99.6
> 10 fred 05 0.00 0.10 99.80 0.0 0.1 99.8
> 11 fred 06 19.40 4.80 53.70 19.4 4.8 53.7
> 12 fred 07 0.10 0.20 99.70 0.1 0.2 99.7
> 13 fred 08 0.00 0.20 99.80 0.0 0.2 99.8
> 14 fred 09 0.00 0.20 99.70 0.0 0.2 99.7
> 15 fred 12 0.20 0.30 99.50 0.2 0.3 99.5
> 16 fred 13 0.60 0.40 98.90 0.6 0.4 98.9
> 17 fred 19 0.10 0.20 99.50 0.1 0.2 99.6
> >
>
>
> On 11/7/07, Mike Nielsen <[EMAIL PROTECTED]> wrote:
> > R-Helpers,
> >
> > I'm sorry to have to ask this -- I've not used R very much in the last
> > 8 or 10 months, and I've gotten rusty.
> >
> > I have the following (ff2 is a subset of a much, much larger dataset):
> >
> > > ff2
> > hostName user sys idle obsTime
> > 10142 fred 0.4 0.5 98.0 2007-11-01 02:02:18
> > 16886 barney 0.5 0.2 94.6 2007-10-25 19:12:12
> > 8795 fred 0.0 0.1 99.8 2007-10-30 05:08:22
> > 5261 fred 0.1 0.2 99.7 2007-10-25 07:20:32
> > 12427 barney 0.1 0.2 93.2 2007-10-19 14:34:10
> > 18067 barney 0.1 0.2 99.4 2007-10-27 10:34:08
> > 973 fred 0.0 0.2 99.8 2007-10-19 08:24:22
> > 5426 fred 0.2 0.3 99.5 2007-10-25 12:50:33
> > 7067 fred 0.1 0.2 99.4 2007-10-27 19:32:27
> > 13159 barney 0.1 0.4 84.3 2007-10-20 14:58:11
> > 17481 barney 1.2 2.0 92.6 2007-10-26 15:02:11
> > 21632 barney 0.1 0.1 99.6 2007-11-01 09:24:09
> > 206 fred 19.4 4.8 53.7 2007-10-18 06:50:34
> > 18151 barney 0.1 0.2 94.9 2007-10-27 13:22:09
> > 10662 fred 0.1 0.2 99.6 2007-11-01 19:22:27
> > 10376 fred 0.0 0.2 99.7 2007-11-01 09:50:24
> > 3630 fred 43.7 7.0 33.0 2007-10-23 00:58:27
> > 1118 fred 0.6 0.4 98.9 2007-10-19 13:14:23
> > 5122 fred 0.1 0.2 99.6 2007-10-25 02:42:21
> > 22117 barney 0.0 0.2 99.4 2007-11-02 01:34:04
> >
> > > doit(ff2)
> > hostName hour user.mean sys.mean idle.mean user.max sys.max idle.max
> > 1 barney 01 0.00 0.20 99.40 0.0 0.2 99.4
> > 2 barney 09 0.10 0.10 99.60 0.1 0.1 99.6
> > 3 barney 10 0.10 0.20 99.40 0.1 0.2 99.4
> > 4 barney 13 0.10 0.20 94.90 0.1 0.2 94.9
> > 5 barney 14 0.10 0.30 88.75 0.1 0.4 93.2
> > 6 barney 15 1.20 2.00 92.60 1.2 2.0 92.6
> > 7 barney 19 0.50 0.20 94.60 0.5 0.2 94.6
> > 8 fred 00 43.70 7.00 33.00 43.7 7.0 33.0
> > 9 fred 02 0.25 0.35 98.80 0.4 0.5 99.6
> > 10 fred 05 0.00 0.10 99.80 0.0 0.1 99.8
> > 11 fred 06 19.40 4.80 53.70 19.4 4.8 53.7
> > 12 fred 07 0.10 0.20 99.70 0.1 0.2 99.7
> > 13 fred 08 0.00 0.20 99.80 0.0 0.2 99.8
> > 14 fred 09 0.00 0.20 99.70 0.0 0.2 99.7
> > 15 fred 12 0.20 0.30 99.50 0.2 0.3 99.5
> > 16 fred 13 0.60 0.40 98.90 0.6 0.4 98.9
> > 17 fred 19 0.10 0.20 99.50 0.1 0.2 99.6
> > > doit
> > function(x){
> > x.mean<-aggregate(x[,c("user","sys","idle")],
> > by=list(hostName=x$hostName,
> >
> > hour=strftime(as.POSIXlt(x$obsTime),"%H")),
> > mean)
> >
> > x.max<-aggregate(x[,c("user","sys","idle")],
> > by=list(hostName=x$hostName,
> >
> > hour=strftime(as.POSIXlt(x$obsTime),"%H")),
> > max)
> >
> > t1<-merge(x.mean,x.max
> ,by=c("hostName","hour"),suffixes=c(".mean",".max"))
> > return(t1)
> > }
> >
> > The point of the "doit" function is to make a new dataframe in which
> > the columns are summary statistics of certain columns in the argument.
> >
> > Is there a function similar to:
> >
> > magic.function(ff2[,c("user","system","idle")],
> > by=list(hostName=ff2$hostName,hour=strftime(as.POSIXlt
> (ff2$obsTime),"%H")),
> > function(x){c(mean.user=mean(x$user),
> > mean.system=mean(x$system),
> > mean.idle=mean(x$idle),
> > max.user=max(x$user),
> > max.system=max(x$system),
> > max.idle=max(x$idle))})
> >
> > ie. an "aggregate" that can cope with a non-scalar function and "do
> > what I mean"? My doit function gets more and more ugly the more
> > summary statistics I add, and I worry about the "merge" with hundreds
> > of thousands of rows.
> >
> > I'm almost sure I've seen a solution to what I know is a simple
> > problem, but I guess my search skills are as bad as my "R": I've
> > rummaged around the r-help archives and came up with nothing to show
> > for it.
> >
> >
> > Pointers would be gratefully received.
> >
> > Many thanks.
> > --
> > Regards,
> >
> > Mike Nielsen
> >
> > ______________________________________________
> > [email protected] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>
--
Regards,
Mike Nielsen
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.