I am trying to use the aggregate function to run a function, catsbydat2, that produces the mean, minimum, maximum, and number of observations of the values in a dataframe, inJan2Test, by levels of the dataframe variable MyDay. The output should be in the form of a dataframe.
#my code: # This function should process a data frame and return a data frame # containing the mean, minimum, maximum, and number of observations # in the data frame for each level of MyDay. catsbyday2 <- function(df){ # Create a matrix to hold the calculated values. xx <- matrix(nrow=1,ncol=4) # Give names to the columns. colnames(xx) <- c("Mean","min","max","Nobs") cat("This is the matrix that will hold the results\n",xx,"\n") # For each level of the indexing variable, MyDay, compute the # mean, minimum, maximum, and number of observations in the # dataframe passed to the function. xx[,1] <- mean(df) xx[,2] <- min(df) xx[,3] <- max(df) xx[,4] <- length(df) cat("These are the dimensions of the matrix in the function",dim(xx),"\n") print(xx) return(xx) } # Create data frame inJan2Test <- data.frame(MyDay=rep(c(1,2,3),4),AveragePM2_5=c(10,20,30, 11,21,31, 12,22,32, 15,25,35)) str(inJan2Test) cat("This is the data frame","\n") inJan2Test xx <- aggregate(inJan2Test[,"AveragePM2_5"],list(inJan2Test[,"MyDay"]),catsbyday2,simplify=FALSE) xx class(xx) str(xx) names(xx) # Create a data frame in the format that I expect aggregate would return examplar <- data.frame(mean=c(12,22,32),min=c(10,20,30),max=c(15,25,35),length=c(4,4,4)) examplar str(examplar) While the output is correct (the mean, mean etc. are correctly calculated), the format of the output is not what I want. (1) Although the returned object appears to be a data frame, it does appear to be a "normal" data frame. (see the output of (2) The column names I define in the function are not part of the data frame that is created. (3) The returned values on each row are separated by commas. I would expect them to be separated by spaces. (4) When I run str() on the output it appears that the output dataframe contains a list. > str(xx) 'data.frame': 3 obs. of 2 variables: $ Group.1: num 1 2 3 $ x :List of 3 ..$ : num [1, 1:4] 12 10 15 4 .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : NULL .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs" ..$ : num [1, 1:4] 22 20 25 4 .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : NULL .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs" ..$ : num [1, 1:4] 32 30 35 4 .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : NULL .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs" I want it to simply be a numeric dataframe: mean min max length 12 10 15 4 22 20 25 4 32 30 35 4 which should return the following str examplar <- data.frame(mean=c(12,22,32),min=c(10,20,30),max=c(15,25,35),length=c(4,4,4)) examplar str(examplar) 'data.frame': 3 obs. of 4 variables: $ mean : num 12 22 32 $ min : num 10 20 30 $ max : num 15 25 35 $ length: num 4 4 4 John David Sorkin M.D., Ph.D. Professor of Medicine, University of Maryland School of Medicine; Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; PI Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; Senior Statistician University of Maryland Center for Vascular Research; Division of Gerontology and Paliative Care, 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 Cell phone 443-418-5382 ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.