Re: [R] question about glm vs. loglin()

Michael Friendly Fri, 16 Sep 2011 06:21:27 -0700

Hi Yana

Trying to interpret associations in complex loglinear models from tablesof parameter estimates is like trying to extract sunlight from acucumber. You have to squeeze very hard, and then are usually unhappywith the quality of the sunlight.

Instead, you can visualize the associations (fitted counts) or theresiduals from the model (pattern of lack of fit) with mosaic andrelated displays from the vcd package.


See the vcdExtra package for a tutorial vignette, but as a first step, try

library(vcdExtra)
wiscmod0 <- glm( counts ~ S + E +P , data=wisconsin, family=poisson)
mosaic(wiscmod0, ~ S+E+P)

wiscmod1 <- glm( counts ~ S + E +P + S:E + S:P + E:P, data=wisconsin,family=poisson)

mosaic(wiscmod1, ~ S+E+P)

?mosaic
for further options.

vignette("vcd-tutorial", package="vcdExtra")

hth
-Michael


On 9/15/2011 4:33 PM, Yana Kane-Esrig wrote:

Dear R gurus,

I am looking for a way to fit a predictive model for a contingency table which has 
counts. I found that glm( family=poisson) is very good for figuring out which of several 
alternative models I should select. But once I select a model it is hard to present and 
interpret it, especially when it has interactions, because everything is done 
"relative to reference cell". This makes it confusing for the audience.


I found that loglin() gives what might be much easier to interpret output as 
far as coefficients estimates are concerned because they are laid out in a nice 
table and are provided for all the values of all the factors. But I need to be
able to explain what the coefficients really mean. For that, I need to
understand how they are used in a formula to compute a fitted value.

If loglin() has fitted a model (see example below) what would be a formula that 
it would use to computer predicted count for,
say, the cell with S = H, E=H, P=No in a sample that has a total of 4991 
observations?Â  In other words, how did it arrive at the number 270.01843 in 
the upper left hand corner of $fit?


I see that loglin() computes exactly the same predictions (fitted values) as
glm( counts ~ S + E +P + S:E + S:P + E:P, data=wisconsin, family=poisson) see 
below)Â  but it gives different values of the estimates for parameters. So I 
figure the formula it uses to compute
the fitted values is not the same as what is used in Poisson
regression.

If there is a better way to fit this type of model and provide easy to 
understand and interpret / present coefficient summary, please let me know.

Just in case, I provided the original data at the very bottom.Â
Â


YZK



#################### use loglin() ###################################


loglin.3 = loglin(wisconsin.table,
margin = list( c(1,2), c(1,3), c(2,3) ), fit=T, param=T)
loglin.3

loglin.3

$lrt
[1] 1.575469

$pearson
[1] 1.572796

$df
[1]
  3

$margin
$margin[[1]]
[1] "S" "E"

$margin[[2]]
[1] "S" "P"

$margin[[3]]
[1] "E" "P"


$fit
, , P = No

Â Â Â  E
SÂ Â Â Â Â Â Â Â Â Â Â  HÂ Â Â Â Â Â Â Â  L
Â  HÂ  270.01843 148.98226
Â  LÂ  228.85782 753.14127
Â  LM 331.04036 625.95942
Â  UM 373.08339 420.91704

, , P = Yes

Â Â Â  E
SÂ Â Â Â Â Â Â Â Â Â Â  HÂ Â Â Â Â Â Â Â  L
Â  HÂ  795.97572Â  30.02330
Â  LÂ  137.14648Â  30.85410
Â  LM 301.96657Â  39.03387
Â  UM 467.91123Â  36.08873


$param
$param$`(Intercept)`
[1] 5.275394

$param$S
Â Â Â Â Â Â Â Â  HÂ Â Â Â Â Â Â Â Â
  LÂ Â Â Â Â Â Â Â  LMÂ Â Â Â Â Â Â Â  UM
-0.1044289 -0.1734756Â  0.1286741Â  0.1492304
#I think this says that we had a lot of S = LM and S= UM kids in our sample and 
relatively few S= L kids

$param$E
Â Â Â Â Â Â Â  HÂ Â Â Â Â Â Â Â  L
Â 0.501462 -0.501462
#I think this says that more kids had E=H than E=L
# sum(wisconsin$counts[wisconsin$E=="L"]) [1] 2085
# sum(wisconsin$counts[wisconsin$E=="H"]) [1] 2906

$param$P
Â Â Â Â Â Â Â  NoÂ Â Â Â Â Â Â  Yes
Â 0.5827855 -0.5827855

$param$S.E
Â Â Â  E
SÂ Â Â Â Â Â Â Â Â Â Â Â  HÂ Â Â Â Â Â Â Â Â  L
Â  HÂ Â  0.4666025 -0.4666025Â  #kids in S=H were
  more likely to get E=H than E=L
Â  LÂ  -0.4263050Â  0.4263050Â  #kids in S=L were more likely to get E=L than 
E=H
Â  LM -0.1492516Â  0.1492516
Â  UMÂ  0.1089541 -0.1089541

$param$S.P
Â Â Â  P
SÂ Â Â Â Â Â Â Â Â Â Â Â  NoÂ Â Â Â Â Â Â Â  Yes
Â  HÂ  -0.45259177Â  0.45259177
Â  LÂ Â  0.34397315 -0.34397315
Â  LMÂ  0.13390947 -0.13390947
Â  UM -0.02529085Â  0.02529085

$param$E.P
Â Â  P
EÂ Â Â Â Â Â Â Â Â  NoÂ Â Â Â Â Â  Yes
Â  H -0.670733Â  0.670733Â  #kids with E=H were more likely to have P=Yes than 
kids with E=L
Â  LÂ  0.670733 -0.670733


############### use glm () ########################################

summary(glm2)

Call:
glm(formula = counts ~ S + E + P + S:E + S:P + E:P, family = poisson,
Â Â Â  data = wisconsin)

Deviance Residuals:
Â Â Â Â Â Â  1Â Â Â Â Â Â Â Â  2Â Â Â Â Â Â Â Â  3Â Â Â Â Â Â Â Â  4Â Â Â Â Â Â 
Â Â  5Â Â Â Â Â Â Â Â  6Â Â Â Â Â Â Â Â  7Â Â Â Â Â Â Â Â  8Â
-0.15119Â Â  0.27320Â Â  0.04135Â  -0.05691Â  -0.04446Â Â  0.04719Â Â  0.32807Â 
 -0.24539Â
Â Â Â Â Â Â  9Â Â Â Â Â Â Â  10Â Â Â Â Â Â Â  11Â Â Â Â Â Â Â  12Â Â Â Â Â Â Â  
13Â Â Â Â Â Â Â  14Â Â Â Â Â Â Â  15Â Â Â Â Â Â Â  16Â
Â 0.73044Â  -0.35578Â  -0.16639Â Â  0.05952Â Â  0.15116Â  -0.04217Â  -0.75147Â 
Â  0.14245Â

Coefficients:
Â Â Â Â Â Â Â Â Â Â Â  Estimate Std. Error z value Pr(>|z|)Â Â Â
(Intercept)Â  5.59850Â Â Â  0.05886Â  95.116Â<  2e-16 ***
SLÂ Â Â Â Â Â Â Â Â  -0.16542Â Â Â  0.08573Â  -1.930Â  0.05366 .Â
SLMÂ Â Â Â Â Â Â Â Â  0.20372Â Â Â  0.07841Â Â  2.598Â  0.00937 **
SUMÂ Â Â Â Â Â Â Â Â  0.32331Â Â Â  0.07664Â Â  4.219 2.46e-05 ***
ELÂ Â Â Â Â Â Â Â Â  -0.59471Â Â Â  0.09234Â  -6.441 1.19e-10 ***
PYesÂ Â Â Â Â Â Â Â  1.08107Â Â Â  0.06731Â  16.060Â<  2e-16 ***
SL:ELÂ Â Â Â Â Â Â  1.78588Â Â Â  0.11444Â  15.606Â<  2e-16 ***
SLM:ELÂ Â Â Â Â Â  1.23178Â Â Â  0.10987Â  11.211Â<  2e-16 ***
SUM:ELÂ Â Â Â Â Â  0.71532Â Â Â  0.11136Â Â  6.424 1.33e-10 ***
SL:PYesÂ Â Â Â  -1.59311Â Â Â  0.11527 -13.820Â<  2e-16 ***
SLM:PYesÂ Â Â  -1.17298Â Â Â  0.09803 -11.965Â<  2e-16 ***
SUM:PYesÂ Â Â  -0.85460Â Â Â  0.09259Â  -9.230Â<  2e-16 ***
EL:PYesÂ Â Â Â  -2.68292Â Â Â  0.09867 -27.191Â<  2e-16 ***
---
Signif. codes:Â  0 â€˜***â€™ 0.001 â€˜**â€™ 0.01 â€˜*â€™ 0.05 â€˜.â€™ 0.1 â€˜ 
â€™ 1

(Dispersion parameter for poisson family taken to be 1)

Â Â Â  Null deviance: 3211.0014Â  on 15Â  degrees of freedom
Residual deviance:Â Â Â  1.5755Â  onÂ  3Â  degrees of freedom
AIC: 141.39

################ Original data ############################


#data from Wisconsin that classifies 4991 high school seniors according to
socio-economic status S= (low, lower middle, upper middle, and high),
# the degree of parental encouragement they receive E= (low and high)
# and whether or not they have plans to attend college P(no, yes).

#s= social stratum, E=parental encouragement P= college plans

#S= social stratum, E=parental encouragement P= college plans

S=c("L", "L", "LM", "LM", "UM", "UM", "H", "H")
S=c(S,S)

E = rep ( c("L", "H"), 8)

P=Â  c (rep("No", 8), rep("Yes",8))

counts = c(749, 233, 627, 330, 420, 374, 153, 266,
35,133,38,303,37,467,26,800)




wisconsin = data.frame(S, E, P, counts)

wisconsin

Â Â Â  S EÂ Â  P counts
1Â Â  L LÂ  NoÂ Â Â  749
2Â Â  L HÂ  NoÂ Â Â  233
3Â  LM LÂ  NoÂ Â Â  627
4Â  LM HÂ  NoÂ Â Â  330
5Â  UM LÂ  NoÂ Â Â  420
6Â  UM HÂ  NoÂ Â Â  374
7Â Â  H LÂ  NoÂ Â Â  153
8Â Â  H HÂ  NoÂ Â Â  266
9Â Â  L L YesÂ Â Â Â  35
10Â  L H YesÂ Â Â  133
11 LM L YesÂ Â Â Â  38
12 LM H YesÂ Â Â  303
13 UM L YesÂ Â Â Â  37
14 UM H YesÂ Â Â  467
15Â  H L YesÂ Â Â Â  26
16Â  H H YesÂ Â Â  800
        [[alternative HTML version deleted]]



--
Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street    Web:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question about glm vs. loglin()

Reply via email to