Re: [R] Package survey: singularities in linear regression models

Thomas Lumley Sun, 05 May 2013 14:06:27 -0700

Ok, that's more helpful.

The problem is with replicate-weight designs, and it's because svyglm()
uses the fitted coefficients from the point estimate as starting values for
fitting the replicates.  And even if that is changed, the computation of
the replicate variance doesn't like all the replicates having NAs.


A workaround is to use withReplicates() and to extract only the non-NA
coefficients

> withReplicates(design, quote(coef(glm(achievement ~ origin * group,
weights=.weights))[-12]))
                                       theta      SE
(Intercept)                        446.55058  8.3059
originFrance                         4.44506 15.1254
originGB                            82.74938 15.1038
originNetherlands                   40.31827 14.9648
originother                         67.83235  9.1836
originSweden                        30.45314 16.3332
groupschooltype2                     0.89859 10.2257
originFrance:groupschooltype2        6.73524 23.2032
originGB:groupschooltype2          -40.73301 18.1516
originNetherlands:groupschooltype2  15.62253 19.8704
originother:groupschooltype2       -34.59610 11.9646

   -thomas





On Sat, May 4, 2013 at 2:59 AM, Sebastian Weirich <
sebastian.weir...@iqb.hu-berlin.de> wrote:

>  Well, I have uploaded the data in the public folder of my dropbox. Due
> to data confidentiality, I haved to change the labels. To load the data:
>
> con <- url( 
> "http://dl.dropboxusercontent.com/u/101865137/datEx.rda";<http://dl.dropboxusercontent.com/u/101865137/datEx.rda>)
> print(load(con))
>
> # The replicate weights were created according to the jackknife (JK2)
> procedure in the same way as implemented in WesVar.
> # According to 100 JK zones, 100 replicate weights result. The replicate
> weights are labelled "totwgtM_1" to "totwgtM_100"
> # The regression I want to specify is achievement on group and origin.
> Both predictors are factors.
>
> library(survey)
> design   <- svrepdesign(data = datEx[, c("origin", "group",
> "achievement")], weights = datEx[ ,"pweight"],
>         type="JKn", scale = 1, rscales = 1, repweights =
> datEx[,grep("^totwgtM_", colnames(datEx))], combined.weights = TRUE, mse =
> TRUE)
>
> # This works
> mod1     <- svyglm(formula = achievement ~ origin + group, design =
> design, return.replicates = FALSE, family = gaussian(link="identity"))
>
> # I get the error message when specifying the interaction
> mod2     <- svyglm(formula = achievement ~ origin * group, design =
> design, return.replicates = FALSE, family = gaussian(link="identity"))
>
> # The output of the conventional glm() function reports singularities for
> one coefficient of the interaction
> mod3     <- glm(formula = achievement ~ origin * group, data = datEx,
> family = gaussian(link = "identity"))
>
> Thanks again,
> Sebastian
>
> --
> Sebastian Weirich, Dipl.-Psych.
>
> Institut zur Qualitätsentwicklung im Bildungswesen
> Humboldt-Universität zu Berlin
> Sitz: Hannoversche Straße 19, 10115 Berlin
> Postadresse: Unter den Linden 6, 10099 Berlin
>
> Tel: +49-(0)30-2093-46512
>
> Am 02.05.2013 22:02, schrieb Thomas Lumley:
>
> On Fri, May 3, 2013 at 2:27 AM, Sebastian Weirich <
> sebastian.weir...@iqb.hu-berlin.de> wrote:
>
>> Hello,
>>
>> I want to specify a linear regression model in which the metric outcome
>> is predicted by two factors and their interaction. glm() computes effects
>> for each factor level and the levels of the interaction. In the case of
>> singularities glm() displays "NA" for the corresponding coefficients.
>> However, svyglm() aborts with an error message. Is there a possibility that
>> svyglm() provides output for coefficients without singularities like glm()?
>>
>>
>  It's not true that svyglm() aborts with an error message whenever there
> are singularities, eg
>
>  > svyglm(enroll~stype+I(stype),design=dclus1)
> 1 - level Cluster Sampling design
> With (15) clusters.
> svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc)
>
>  Call:  svyglm(formula = enroll ~ stype + I(stype), design = dclus1)
>
>  Coefficients:
> (Intercept)       stypeH       stypeM    I(stype)H    I(stype)M
>       432.9        697.4        464.9           NA           NA
>
>  Degrees of Freedom: 182 Total (i.e. Null);  12 Residual
> Null Deviance:    24830000
> Residual Deviance: 15120000 AIC: 2599
>
>
>  So, perhaps you could show us what you actually did, and what actually
> happened, as the posting guidelines request.
>
>      -thomas
>
>  --
> Thomas Lumley
> Professor of Biostatistics
> University of Auckland
>
>
>


-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Package survey: singularities in linear regression models

Reply via email to