Re: [R] Sparseby Problems

Dennis Murphy Wed, 21 Apr 2010 16:54:41 -0700

Hi:

I'm a big fan of the reshape package, but this time I think that the doBy
and plyr
packages may better suit your needs. Since you mentioned wanting to get the
min/mean/max of several variables simultaneously, I took out line54 and
added
some vectors of Gaussian(0, 1) random numbers for testing:


test <- data.frame(mF[, -5], x1 = rnorm(23), x2 = rnorm(23), x3 = rnorm(23))

### doBy approach:
# Create a function for doBy to use on a specific variable:

f <- function(x) {
   c(min = min(x, na.rm = TRUE), mean = mean(x, na.rm = TRUE),
     max = max(x, na.rm = TRUE))
  }

library(doBy)
> summaryBy(x1 + x2 + x3 ~ Season, data = test, FUN = f)
  Season    x1.min    x1.mean   x1.max     x2.min     x2.mean    x2.max
1      1 -1.108496 -0.2590727 1.692468 -0.8958644 -0.00485722 0.6525678
2      2 -1.686261  0.4655741 2.097220 -0.9484292  0.37197098 2.6325965
3      3 -1.093520 -0.2049273 0.390061 -0.6886613  0.49534667 2.4263802
       x3.min     x3.mean    x3.max
1 -2.07369239 -0.05164301 1.6199843
2 -0.43556155  0.31221804 1.1939009
3 -0.04847558  0.15200570 0.4355102

The LHS of the formula consists of the variables you want summarized,
the RHS contains the grouping variable(s), the data supplied MUST be a data
frame and FUN is the function you want applied to each variable. In this
case,
the function returns a vector of the min, mean and max of the input
variable.
Notice that the names given in the function are appended to the variable
name,
separated by a dot. (A nice touch by the package author...)

If you have a number of variables to summarize in this fashion, doBy is well
designed for this type of task in the sense that the syntax is pretty
straightforward.

#### plyr approach
To accomplish the same task in plyr with ddply(), you've got to be a little
more clever -
use numcolwise() in combination with each(). numcolwise() applies the same
function
to each numeric variable in the input data frame; each() applies the list of
functions
supplied as its arguments to a single input variable. The call below is a
composition of
the two functions:

> ddply(test, .(Season), numcolwise(each(min, mean, max)))
  Season         x1          x2          x3
1      1 -1.1084957 -0.89586438 -2.07369239
2      1 -0.2590727 -0.00485722 -0.05164301
3      1  1.6924681  0.65256782  1.61998433
4      2 -1.6862610 -0.94842919 -0.43556155
5      2  0.4655741  0.37197098  0.31221804
6      2  2.0972202  2.63259653  1.19390094
7      3 -1.0935199 -0.68866127 -0.04847558
8      3 -0.2049273  0.49534667  0.15200570
9      3  0.3900610  2.42638021  0.43551022

To distinguish the measures in each row, create a factor of stat names
and then rearrange the order of columns to get something a little more
presentable:
> summ <- ddply(test, .(Season), numcolwise(each(min, mean, max)))
> summ$stat <- rep(c('Min', 'Mean', 'Max'), 3)   # add vector of names
> summ <- summ[, c(1, 5, 2:4)]   # column rearrangement
> summ
  Season stat         x1          x2          x3
1      1  Min -1.1084957 -0.89586438 -2.07369239
2      1 Mean -0.2590727 -0.00485722 -0.05164301
3      1  Max  1.6924681  0.65256782  1.61998433
4      2  Min -1.6862610 -0.94842919 -0.43556155
5      2 Mean  0.4655741  0.37197098  0.31221804
6      2  Max  2.0972202  2.63259653  1.19390094
7      3  Min -1.0935199 -0.68866127 -0.04847558
8      3 Mean -0.2049273  0.49534667  0.15200570
9      3  Max  0.3900610  2.42638021  0.43551022

The two functions give you two different ways to present the summaries; take
your pick.

HTH,
Dennis


On Wed, Apr 21, 2010 at 10:16 AM, Ben Stewart <bpste...@uvic.ca> wrote:

> I've got a problem with the sparseby command (reshape library), and I have
> reached the peak of my R knowledge (it isn't really that high).
>
> I have a small data frame of 23 rows and 15 columns, here is a subset, the
> first four columns are factors and the rest are numeric (only one, line54
> is
> provided).
>
>   bearID YEAR Season SEX      line54
> 5    1900    8      3   0  16.3923519
> 11   2270    5      1   0 233.7414014
> 12   2271    5      1   0 290.8207652
> 13   2271    5      2   0 244.7820844
> 15   2291    5      1   0   0.0000000
> 16   2291    5      2   0  14.5037795
> 17   2291    6      1   0   0.0000000
> 18   2293    5      2   0 144.7440752
> 19   2293    5      3   0   0.0000000
> 20   2293    6      1   0  16.0592270
> 21   2293    6      2   0  30.1383426
> 28   2298    5      1   0   0.9741067
> 29   2298    5      2   0   9.6641018
> 30   2298    6      2   0   8.6533828
> 31   2309    5      2   0  85.9781303
> 32   2325    6      1   0 110.8892153
> 35   2331    6      1   0  26.7335562
> 44   2390    7      2   0   7.1690620
> 45   2390    8      2   0  44.1109897
> 46   2390    8      3   0 503.9074898
> 47   2390    9      2   0   8.4393660
> 54   2416    7      3   0  48.6910907
> 58   2418    8      2   0   5.7951139
>
> Sparseby works fine when I try to calculate mean
>
> >sparseby(mF[1:5], mF$Season, mean)
>
>  mF$Season bearID YEAR Season SEX    line54
> 1         1     NA   NA     NA   0  84.90228
> 2         2     NA   NA     NA   0  54.90713
> 3         3     NA   NA     NA   0 142.24773
>
> But it goes nuts when looking for max or min
>
> > sparseby(mF[5:6], mF$Season, max)
>  mF$Season structure(c(2169.49621795108, 1885.22677689026, 2492.17544685464
> 1         1
> 2169.496
> 2         2
> 1885.227
> 3         3
> 2492.175
>
> Any ideas? All I want is to calculate create three data.frames, mean, min
> and max.
>
> Thanks,
>
> Ben Stewart
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sparseby Problems

Reply via email to