Thanks Josh. I built on your example and ended up with the code below--if you
or anyone sees any issues please let me know. It would be great if there were
a slicker way to get these kinds of summary stats in R, but this gets the job
done.
# takes data frame z with weights w and data x, returns weighted mean, weighted
SE, and N
msenw = function(z){
N = length(na.omit(z)$response)
i = which(!is.na(z$response))
return(
c( W.M = weighted.mean(z$response, z$weights, na.rm=T),
W.SE = sqrt(wtd.var(z$response, weights =
z$weights))/sqrt(sum(z$weights[i])),
N=N ) )
}
library(doBy)
library(Hmisc)
## make up some data (easier)
mydata <- data.frame(response = rnorm(100),
group = rep(1:5, each = 20), weights = runif(100, 0, 1))
xy <- by(mydata, mydata$group, msenw)
data.frame( group = names(c(xy)), do.call(rbind, xy) )
## can be extended to other data using:
xy <- by(data.frame(response = mydata$response, weights = mydata$weights),
mydata$group, msenw)
Solomon Messing
www.stanford.edu/~messing
On Jan 16, 2011, at 11:16 PM, Joshua Wiley wrote:
> Dear Solomon,
>
> On Sun, Jan 16, 2011 at 10:27 PM, Solomon Messing
> <[email protected]> wrote:
>> Dear Soren and R users:
>>
>> I am trying to use the summaryBy function with weights. Is this possible?
>> An example that illustrates what I am trying to do follows:
>>
>> library(doBy)
>> ## make up some data
>> response = rnorm(100)
>> group = c(rep(1,20), rep(2,20), rep(3,20), rep(4,20), rep(5,20))
>> weights = runif(100, 0, 1)
>> mydata = data.frame(response,group,weights)
>>
>> ## run summaryBy without weights:
>> summaryBy(response~group, data = mydata, FUN = mean)
>>
>> ## attempt to run summaryBy with weights, throws error
>> summaryBy(x~group, data = mydata, FUN = weighted.mean, w=weights )
>>
>> ## throws the error:
>> # Error in tapply(lh.data[, lh.var[vv]], rh.string.factor, function(x) { :
>> # arguments must have same length
>>
>> My guess is that summaryBy is not giving weighted.mean() each group of
>> weights, but instead is passing all of the weights in the data set each time
>> it calls weighted.mean().
>
> Yes, of course. It has no way of knowing that the weights should also
> be being broken down by group....they are not in the formula.
>
>> Do you know if there is some way to get summaryBy to pass weights to
>> weighted.mean() only for each group?
>
> Ideally there would be a way to pass more than one variable to a
> function (e.g., response and weights) or just an entire object
> (mydata) broken down by group. Then you would just make a wrapper
> function to pass the right values to the x and w arguments of
> weighted.mean. Instead here is a somewhat hacked version:
>
> library(doBy)
> ## make up some data (easier)
> mydata <- data.frame(response = rnorm(100),
> group = rep(1:5, each = 20), weights = runif(100, 0, 1))
>
> ## manually compute weighted mean
> tmp <- summaryBy(response*weights ~ group, data = mydata, FUN = sum)
> tmp[,2] <- tmp[,2]/with(mydata, tapply(weights, group, sum))
> tmp ## weighted means
>
> ## here's the 'problem', if you will, even with +, they are passed
> one at a time
> summaryBy(response + weights ~ group, data = mydata, FUN = str)
> summaryBy(mydata ~ group, data = mydata, FUN = str)
>
> ## here is an option using by():
> xy <- by(mydata, mydata$group, function(z) weighted.mean(z$response,
> z$weights))
> xy
> ## if you don't like the formatting....
> data.frame(group = names(c(xy)), weighted.mean = c(xy))
>
> HTH,
>
> Josh
>
>>
>> I suspect this functionality would be a tremendous benefit to R users who
>> regularly work with weighted data, such as myself.
>>
>> Thanks,
>>
>> Solomon Messing
>> www.stanford.edu/~messing
>>
>> PS I know this basic example can be done using lapply(split(...)) approach
>> referenced here:
>>
>> http://www.mail-archive.com/[email protected]/msg12349.html
>>
>> but for more complex tasks the lapply approach will mean writing a lot of
>> extra code to run everything and then to get things formatted as nicely as
>> summaryBy() was designed to do.
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [email protected] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.