Às 20:31 de 11/12/2024, Sorkin, John escreveu:
I am trying to use the aggregate function to run a function, catsbydat2, that 
produces the mean, minimum, maximum, and number of observations of the values 
in a dataframe, inJan2Test, by levels of the dataframe variable MyDay. The 
output should be in the form of a dataframe.

#my code:
# This function should process a data frame and return a data frame
# containing the mean, minimum, maximum, and number of observations
# in the data frame for each level of MyDay.
catsbyday2 <- function(df){
   # Create a matrix to hold the calculated values.
   xx <- matrix(nrow=1,ncol=4)
   # Give names to the columns.
   colnames(xx) <- c("Mean","min","max","Nobs")
   cat("This is the matrix that will hold the results\n",xx,"\n")

   # For each level of the indexing variable, MyDay, compute the
   # mean, minimum, maximum, and number of observations in the
   # dataframe passed to the function.
   xx[,1] <- mean(df)
   xx[,2] <- min(df)
   xx[,3] <- max(df)
   xx[,4] <- length(df)
   cat("These are the dimensions of the matrix in the function",dim(xx),"\n")
   print(xx)
   return(xx)
}

# Create data frame
inJan2Test <- data.frame(MyDay=rep(c(1,2,3),4),AveragePM2_5=c(10,20,30,
                                                               11,21,31,
                                                               12,22,32,
                                                               15,25,35))
str(inJan2Test)
cat("This is the data frame","\n")
inJan2Test

xx <- 
aggregate(inJan2Test[,"AveragePM2_5"],list(inJan2Test[,"MyDay"]),catsbyday2,simplify=FALSE)
xx
class(xx)
str(xx)
names(xx)

# Create a data frame in the format that I expect aggregate would return
examplar <- 
data.frame(mean=c(12,22,32),min=c(10,20,30),max=c(15,25,35),length=c(4,4,4))
examplar
str(examplar)


While the output is correct (the mean, mean etc. are correctly calculated), the 
format of the output is not what I want.

(1) Although the returned object appears to be a data frame, it does appear to be a 
"normal" data frame. (see the output of
(2) The column names I define in the function are not part of the data frame 
that is created.
(3) The returned values on each row are separated by commas. I would expect 
them to be separated by spaces.
(4) When I run str() on the output it appears that the output dataframe 
contains a list.
str(xx)
'data.frame':   3 obs. of  2 variables:
  $ Group.1: num  1 2 3
  $ x      :List of 3
   ..$ : num [1, 1:4] 12 10 15 4
   .. ..- attr(*, "dimnames")=List of 2
   .. .. ..$ : NULL
   .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs"
   ..$ : num [1, 1:4] 22 20 25 4
   .. ..- attr(*, "dimnames")=List of 2
   .. .. ..$ : NULL
   .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs"
   ..$ : num [1, 1:4] 32 30 35 4
   .. ..- attr(*, "dimnames")=List of 2
   .. .. ..$ : NULL
   .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs"

I want it to simply be a numeric dataframe:

mean  min max length
    12      10    15     4
    22      20    25     4
    32      30     35    4

which should return the following str

examplar <- 
data.frame(mean=c(12,22,32),min=c(10,20,30),max=c(15,25,35),length=c(4,4,4))
examplar
str(examplar)

'data.frame':   3 obs. of  4 variables:
  $ mean  : num  12 22 32
  $ min   : num  10 20 30
  $ max   : num  15 25 35
  $ length: num  4 4 4

John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;
Associate Director for Biostatistics and Informatics, Baltimore VA Medical 
Center Geriatrics Research, Education, and Clinical Center;
PI Biostatistics and Informatics Core, University of Maryland School of 
Medicine Claude D. Pepper Older Americans Independence Center;
Senior Statistician University of Maryland Center for Vascular Research;

Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382



______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hello,

The code can be made much simpler. The summary statistics function is a one-liner, it just computes and returns a named vector. But the statistics are now in a matrix, the last column is a matrix column and if you print the result, agg, you will see the name AveragePM2_5 with suffixes "mean", "min", "max" and "nobs" appended.

You can solve this by removing that column from the result and cbind it with the rest of the agg data.frame.




catsbyday2 <- function(x) {
  c(mean = mean(x), min = min(x), max = max(x), nobs = length(x))
}

agg <- aggregate(AveragePM2_5 ~ MyDay, inJan2Test, FUN = catsbyday2)

# The 2nd column is a matrix 3x4
str(agg)
#> 'data.frame':    3 obs. of  2 variables:
#>  $ MyDay       : num  1 2 3
#>  $ AveragePM2_5: num [1:3, 1:4] 12 22 32 10 20 30 15 25 35 4 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:4] "mean" "min" "max" "nobs"

# this solves it, the method cbind.data.frame is
# called since the 1st argument is a df
cbind(agg[-ncol(agg)], agg[[ncol(agg)]])
#>   MyDay mean min max nobs
#> 1     1   12  10  15    4
#> 2     2   22  20  25    4
#> 3     3   32  30  35    4


# a data.frame
agg[-ncol(agg)]
#>   MyDay
#> 1     1
#> 2     2
#> 3     3

# the matrix column
agg[[ncol(agg)]]
#>      mean min max nobs
#> [1,]   12  10  15    4
#> [2,]   22  20  25    4
#> [3,]   32  30  35    4



Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to