Re: [R] question about glm vs. loglin()

David Winsemius Thu, 15 Sep 2011 15:08:14 -0700


On Sep 15, 2011, at 4:33 PM, Yana Kane-Esrig wrote:

Dear R gurus,
I am looking for a way to fit a predictive model for a contingencytable which has counts. I found that glm( family=poisson) is verygood for figuring out which of several alternative models I shouldselect. But once I select a model it is hard to present andinterpret it, especially when it has interactions, becauseeverything is done "relative to reference cell". This makes itconfusing for the audience.


?predict.glm

The default for "type" is not the only option. Please re-read the helppage and all will be revealed.

That is the level of answer appropriate for what appears to be mostprobably homework. With that in mind let me suggest you refer to yourtextbooks index under the term parametrization for several of thequestions below.

I found that loglin() gives what might be much easier to interpretoutput as far as coefficients estimates are concerned because theyare laid out in a nice table and are provided for all the values ofall the factors. But I need to be

able to explain what the coefficients really mean. For that, I need to
understand how they are used in a formula to compute a fitted value.

If loglin() has fitted a model (see example below) what would be aformula that it would use to computer predicted count for,say, the cell with S = H, E=H, P=No in a sample that has a total of4991 observations? In other words, how did it arrive at the number270.01843 in the upper left hand corner of $fit?

I see that loglin() computes exactly the same predictions (fittedvalues) asglm( counts ~ S + E +P + S:E + S:P + E:P, data=wisconsin,family=poisson) see below) but it gives different values of theestimates for parameters. So I figure the formula it uses to compute

the fitted values is not the same as what is used in Poisson
regression.

If there is a better way to fit this type of model and provide easyto understand and interpret / present coefficient summary, pleaselet me know.


Just in case, I provided the original data at the very bottom.



YZK



#################### use loglin() ###################################


loglin.3 = loglin(wisconsin.table,
margin = list( c(1,2), c(1,3), c(2,3) ), fit=T, param=T)
loglin.3

loglin.3

$lrt
[1] 1.575469

$pearson
[1] 1.572796

$df
[1]
3

$margin
$margin[[1]]
[1] "S" "E"

$margin[[2]]
[1] "S" "P"

$margin[[3]]
[1] "E" "P"


$fit
, , P = No

    E
S            H         L
  H  270.01843 148.98226
  L  228.85782 753.14127
  LM 331.04036 625.95942
  UM 373.08339 420.91704

, , P = Yes

    E
S            H         L
  H  795.97572  30.02330
  L  137.14648  30.85410
  LM 301.96657  39.03387
  UM 467.91123  36.08873


$param
$param$`(Intercept)`
[1] 5.275394

$param$S
         H
L         LM         UM
-0.1044289 -0.1734756  0.1286741  0.1492304

#I think this says that we had a lot of S = LM and S= UM kids in oursample and relatively few S= L kids


$param$E
        H         L
 0.501462 -0.501462
#I think this says that more kids had E=H than E=L
# sum(wisconsin$counts[wisconsin$E=="L"]) [1] 2085
# sum(wisconsin$counts[wisconsin$E=="H"]) [1] 2906

$param$P
        No        Yes
 0.5827855 -0.5827855

$param$S.E
    E
S             H          L
  H   0.4666025 -0.4666025  #kids in S=H were
more likely to get E=H than E=L

L -0.4263050 0.4263050 #kids in S=L were more likely to get E=Lthan E=H

  LM -0.1492516  0.1492516
  UM  0.1089541 -0.1089541

$param$S.P
    P
S             No         Yes
  H  -0.45259177  0.45259177
  L   0.34397315 -0.34397315
  LM  0.13390947 -0.13390947
  UM -0.02529085  0.02529085

$param$E.P
   P
E          No       Yes

H -0.670733 0.670733 #kids with E=H were more likely to haveP=Yes than kids with E=L

  L  0.670733 -0.670733


############### use glm () ########################################

summary(glm2)

Call:
glm(formula = counts ~ S + E + P + S:E + S:P + E:P, family = poisson,
    data = wisconsin)

Deviance Residuals:

1 2 3 4 5 67 8-0.15119 0.27320 0.04135 -0.05691 -0.04446 0.047190.32807 -0.245399 10 11 12 13 1415 160.73044 -0.35578 -0.16639 0.05952 0.15116 -0.04217-0.75147 0.14245


Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  5.59850    0.05886  95.116  < 2e-16 ***
SL          -0.16542    0.08573  -1.930  0.05366 .
SLM          0.20372    0.07841   2.598  0.00937 **
SUM          0.32331    0.07664   4.219 2.46e-05 ***
EL          -0.59471    0.09234  -6.441 1.19e-10 ***
PYes         1.08107    0.06731  16.060  < 2e-16 ***
SL:EL        1.78588    0.11444  15.606  < 2e-16 ***
SLM:EL       1.23178    0.10987  11.211  < 2e-16 ***
SUM:EL       0.71532    0.11136   6.424 1.33e-10 ***
SL:PYes     -1.59311    0.11527 -13.820  < 2e-16 ***
SLM:PYes    -1.17298    0.09803 -11.965  < 2e-16 ***
SUM:PYes    -0.85460    0.09259  -9.230  < 2e-16 ***
EL:PYes     -2.68292    0.09867 -27.191  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 3211.0014  on 15  degrees of freedom
Residual deviance:    1.5755  on  3  degrees of freedom
AIC: 141.39

################ Original data ############################

#data from Wisconsin that classifies 4991 high school seniorsaccording to

socio-economic status S= (low, lower middle, upper middle, and high),
# the degree of parental encouragement they receive E= (low and high)
# and whether or not they have plans to attend college P(no, yes).

#s= social stratum, E=parental encouragement P= college plans

#S= social stratum, E=parental encouragement P= college plans

S=c("L", "L", "LM", "LM", "UM", "UM", "H", "H")
S=c(S,S)

E = rep ( c("L", "H"), 8)

P=  c (rep("No", 8), rep("Yes",8))

counts = c(749, 233, 627, 330, 420, 374, 153, 266,
35,133,38,303,37,467,26,800)




wisconsin = data.frame(S, E, P, counts)

wisconsin

    S E   P counts
1   L L  No    749
2   L H  No    233
3  LM L  No    627
4  LM H  No    330
5  UM L  No    420
6  UM H  No    374
7   H L  No    153
8   H H  No    266
9   L L Yes     35
10  L H Yes    133
11 LM L Yes     38
12 LM H Yes    303
13 UM L Yes     37
14 UM H Yes    467
15  H L Yes     26
16  H H Yes    800
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question about glm vs. loglin()

Reply via email to