Re: [R] Weighted descriptives by levels of another variables

Andrew Miles Mon, 16 Nov 2009 07:45:10 -0800

Thanks! Using the plyr package and the approach you outlined seems towork well for relatively simple functions (like wtd.mean), but so farI haven't had much success in using it with more complex descriptivefunctions like describe {Hmisc}. I'll take a look later, though, andsee if I can figure out why.

At any rate, ddply() looks like it will simplify writing a functionthat will allow for weighting data and subdividing it, but still givecomprehensive summary statistics (i.e. not just the mean or quantiles,but all in one). I'll post it to the list once I have the time towrite it up.

I also took a stab at using the svyby funtion in the survey package,but received the following error message when I input :


> svyby(cbind(educ, age), female, svynlsy, svymean)
Error in `[.survey.design2`(design, byfactor %in% byfactor[i], ) :
  (subscript) logical subscript too long
__________________________________________________________

In addition to using the survey package (and the svyby function), I'vefoundthat many of the 'weighted' functions, such as wtd.mean, work wellwith the

plyr package.  For example,

wtdmean=function(df)wtd.mean(df$obese,df$sampwt);
ddply(mydata, ~cut2(age,c(2,6,12,16)),'wtdmean')

hth, david freedman


Andrew Miles-2 wrote:


I've noticed that R has a number of very useful functions for
obtaining descriptive statistics on groups of variables, including
summary {stats}, describe {Hmisc}, and describe {psych}, but none that
I have found is able to provided weighted descriptives of subsets of a
data set (ex. descriptives for both males and females for age, where
accurate results require use of sampling weights).

Does anybody know of a function that does this?

What I've looked at already:

I have looked at describe.by {psych} which will give descriptives by
levels of another variable (eg. mean ages of males and females), but
does not accept sample weights.

I have also looked at describe {Hmisc} which allows for weights, but
has no functionality for subdivision.

I tried using a by() function with describe{Hmisc}:

by(cbind(my, variables, here), division.variable, describe,
weights=weight.variable)

but found that this returns an error message stating that the
variables to be described and the weights variable are not the same
length:

Error in describe.vector(xx, nam[i], exclude.missing =
exclude.missing,  :
  length of weights must equal length of x
In addition: Warning message:
In present & !is.na(weights) :
  longer object length is not a multiple of shorter object length

This comes because the by() function passes down a subset of the
variables to be described to describe(), but not a subset of the
weights variable.  describe() then searches the whatever data set is
attached in order to find the weights variables, but this is in its
original (i.e. not subsetted) form.  Here is an example using the
ChickWeight dataset that comes in the "datasets" package.

data(ChickWeight)
attach(ChickWeight)
library(Hmisc)
#this gives descriptive data on the variables "Time" and "Chick" by
levels of "Diet")
by(cbind(Time, Chick), Diet, describe)
#trying to add weights, however, does not work for reasons described
above
wgt=rnorm(length(Chick), 12, 1)
by(cbind(Time, Chick), Diet, describe, weights=wgt)

Again, my question is, does anybody know of a function that combines
both the ability to provided weighted descriptives with the ability to
subdivide by the levels of some other variable?


Andrew Miles
Department of Sociology
Duke University


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Weighted descriptives by levels of another variables

Reply via email to