Greetings, I'm having a problem with something that I think is very simple - I'd like to be able to use the 'sapply' and 'by' functions in 1 function to be able (for example) to get regression coefficients from multiple models by a grouping variable. I think that I'm missing something that is probably obvious to experienced users.
Here's a simple (trivial) example of what I'd like to do: new <- data.frame(Outcome.1=rnorm(10),Outcome.2=rnorm(10),sex=rep(0:1,5),Pred=rnorm(10)) fxa <- function(x,data) { lm(x~Pred,data=data)$coef } sapply(new[,1:2],fxa,new) # this yields coefficients for the predictor in separate models fxb <- function(x) {lm(Outcome.1~Pred,da=x)$coef}; by(new,new$sex,fxb) #yields the coefficient for Outcome.1 for each sex ## I'd like to be able to combine 'sapply' and 'by' to be able to get the regression coefficients for Outome.1 and Outcome.2 by each sex, rather than running fxb a second time predicting 'Outcome.2' or by subsetting the data - by sex - before I run the function, but the following doesn't work - by(new,new$sex,FUN=function(x)sapply(x[,1:2],fxa,new)) 'Error in model.frame.default(formula = x ~ Pred, data = data, drop.unused.levels = TRUE) : variable lengths differ (found for 'Pred')' ##I understand the error message - the length of 'Pred' is 10 while the length of each sex group is 5, but I'm not sure how to correctly write the 'by' function to use 'sapply' inside it. Could someone please point me in the right direction? Thanks very much in advance David S Freedman, CDC (Atlanta USA) [definitely not the well-know statistician, David A Freedman, in Berkeley] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.