On Wed, 11 Dec 2024, Sorkin, John writes: > I am trying to use the aggregate function to run a function, catsbydat2, that > produces the mean, minimum, maximum, and number of observations of the values > in a dataframe, inJan2Test, by levels of the dataframe variable MyDay. The > output should be in the form of a dataframe. > > #my code: > # This function should process a data frame and return a data frame > # containing the mean, minimum, maximum, and number of observations > # in the data frame for each level of MyDay. > catsbyday2 <- function(df){ > # Create a matrix to hold the calculated values. > xx <- matrix(nrow=1,ncol=4) > # Give names to the columns. > colnames(xx) <- c("Mean","min","max","Nobs") > cat("This is the matrix that will hold the results\n",xx,"\n") > > # For each level of the indexing variable, MyDay, compute the > # mean, minimum, maximum, and number of observations in the > # dataframe passed to the function. > xx[,1] <- mean(df) > xx[,2] <- min(df) > xx[,3] <- max(df) > xx[,4] <- length(df) > cat("These are the dimensions of the matrix in the function",dim(xx),"\n") > print(xx) > return(xx) > } > > # Create data frame > inJan2Test <- data.frame(MyDay=rep(c(1,2,3),4),AveragePM2_5=c(10,20,30, > 11,21,31, > 12,22,32, > 15,25,35)) > str(inJan2Test) > cat("This is the data frame","\n") > inJan2Test > > xx <- > aggregate(inJan2Test[,"AveragePM2_5"],list(inJan2Test[,"MyDay"]),catsbyday2,simplify=FALSE) > xx > class(xx) > str(xx) > names(xx) > > # Create a data frame in the format that I expect aggregate would return > examplar <- > data.frame(mean=c(12,22,32),min=c(10,20,30),max=c(15,25,35),length=c(4,4,4)) > examplar > str(examplar) > > > While the output is correct (the mean, mean etc. are correctly calculated), > the format of the output is not what I want. > > (1) Although the returned object appears to be a data frame, it does appear > to be a "normal" data frame. (see the output of > (2) The column names I define in the function are not part of the data frame > that is created. > (3) The returned values on each row are separated by commas. I would expect > them to be separated by spaces. > (4) When I run str() on the output it appears that the output dataframe > contains a list. >> str(xx) > 'data.frame': 3 obs. of 2 variables: > $ Group.1: num 1 2 3 > $ x :List of 3 > ..$ : num [1, 1:4] 12 10 15 4 > .. ..- attr(*, "dimnames")=List of 2 > .. .. ..$ : NULL > .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs" > ..$ : num [1, 1:4] 22 20 25 4 > .. ..- attr(*, "dimnames")=List of 2 > .. .. ..$ : NULL > .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs" > ..$ : num [1, 1:4] 32 30 35 4 > .. ..- attr(*, "dimnames")=List of 2 > .. .. ..$ : NULL > .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs" > > I want it to simply be a numeric dataframe: > > mean min max length > 12 10 15 4 > 22 20 25 4 > 32 30 35 4 > > which should return the following str > > examplar <- > data.frame(mean=c(12,22,32),min=c(10,20,30),max=c(15,25,35),length=c(4,4,4)) > examplar > str(examplar) > > 'data.frame': 3 obs. of 4 variables: > $ mean : num 12 22 32 > $ min : num 10 20 30 > $ max : num 15 25 35 > $ length: num 4 4 4
You'll no doubt get answers that use 'aggregate', but for such calculations I find 'tapply' much easier/clearer: res <- tapply(inJan2Test$AveragePM2_5, ## what to compute on inJan2Test$MyDay, ## what to group by function(x) c(mean = mean(x), ## what to do for each group min = min(x), max = max(x), length = length(x))) The result will be a list of vectors, which you can bind together: do.call(rbind, res) ## min max mean length ## 1 10 15 12 4 ## 2 20 25 22 4 ## 3 30 35 32 4 (Though the result is a numeric matrix. But that is only one 'as.data.frame' away from a data.frame, if it has to be one.) kind regards Enrico > John David Sorkin M.D., Ph.D. > Professor of Medicine, University of Maryland School of Medicine; > Associate Director for Biostatistics and Informatics, Baltimore VA Medical > Center Geriatrics Research, Education, and Clinical Center; > PI Biostatistics and Informatics Core, University of Maryland School of > Medicine Claude D. Pepper Older Americans Independence Center; > Senior Statistician University of Maryland Center for Vascular Research; > > Division of Gerontology and Paliative Care, > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > Cell phone 443-418-5382 -- Enrico Schumann Lucerne, Switzerland https://enricoschumann.net ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.