I found that loglin() gives what might be much easier to interpret
output as far as coefficients estimates are concerned because they
are laid out in a nice table and are provided for all the values of
all the factors. But I need to be
able to explain what the coefficients really mean. For that, I need to
understand how they are used in a formula to compute a fitted value.
If loglin() has fitted a model (see example below) what would be a
formula that it would use to computer predicted count for,
say, the cell with S = H, E=H, P=No in a sample that has a total of
4991 observations? In other words, how did it arrive at the number
270.01843 in the upper left hand corner of $fit?
I see that loglin() computes exactly the same predictions (fitted
values) as
glm( counts ~ S + E +P + S:E + S:P + E:P, data=wisconsin,
family=poisson) see below) but it gives different values of the
estimates for parameters. So I figure the formula it uses to compute
the fitted values is not the same as what is used in Poisson
regression.
If there is a better way to fit this type of model and provide easy
to understand and interpret / present coefficient summary, please
let me know.
Just in case, I provided the original data at the very bottom.
YZK
#################### use loglin() ###################################
loglin.3 = loglin(wisconsin.table,
margin = list( c(1,2), c(1,3), c(2,3) ), fit=T, param=T)
loglin.3
loglin.3
$lrt
[1] 1.575469
$pearson
[1] 1.572796
$df
[1]
3
$margin
$margin[[1]]
[1] "S" "E"
$margin[[2]]
[1] "S" "P"
$margin[[3]]
[1] "E" "P"
$fit
, , P = No
E
S H L
H 270.01843 148.98226
L 228.85782 753.14127
LM 331.04036 625.95942
UM 373.08339 420.91704
, , P = Yes
E
S H L
H 795.97572 30.02330
L 137.14648 30.85410
LM 301.96657 39.03387
UM 467.91123 36.08873
$param
$param$`(Intercept)`
[1] 5.275394
$param$S
H
L LM UM
-0.1044289 -0.1734756 0.1286741 0.1492304
#I think this says that we had a lot of S = LM and S= UM kids in our
sample and relatively few S= L kids
$param$E
H L
0.501462 -0.501462
#I think this says that more kids had E=H than E=L
# sum(wisconsin$counts[wisconsin$E=="L"]) [1] 2085
# sum(wisconsin$counts[wisconsin$E=="H"]) [1] 2906
$param$P
No Yes
0.5827855 -0.5827855
$param$S.E
E
S H L
H 0.4666025 -0.4666025 #kids in S=H were
more likely to get E=H than E=L
L -0.4263050 0.4263050 #kids in S=L were more likely to get E=L
than E=H
LM -0.1492516 0.1492516
UM 0.1089541 -0.1089541
$param$S.P
P
S No Yes
H -0.45259177 0.45259177
L 0.34397315 -0.34397315
LM 0.13390947 -0.13390947
UM -0.02529085 0.02529085
$param$E.P
P
E No Yes
H -0.670733 0.670733 #kids with E=H were more likely to have
P=Yes than kids with E=L
L 0.670733 -0.670733
############### use glm () ########################################
summary(glm2)
Call:
glm(formula = counts ~ S + E + P + S:E + S:P + E:P, family = poisson,
data = wisconsin)
Deviance Residuals:
1 2 3 4 5 6
7 8
-0.15119 0.27320 0.04135 -0.05691 -0.04446 0.04719
0.32807 -0.24539
9 10 11 12 13 14
15 16
0.73044 -0.35578 -0.16639 0.05952 0.15116 -0.04217
-0.75147 0.14245
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.59850 0.05886 95.116 < 2e-16 ***
SL -0.16542 0.08573 -1.930 0.05366 .
SLM 0.20372 0.07841 2.598 0.00937 **
SUM 0.32331 0.07664 4.219 2.46e-05 ***
EL -0.59471 0.09234 -6.441 1.19e-10 ***
PYes 1.08107 0.06731 16.060 < 2e-16 ***
SL:EL 1.78588 0.11444 15.606 < 2e-16 ***
SLM:EL 1.23178 0.10987 11.211 < 2e-16 ***
SUM:EL 0.71532 0.11136 6.424 1.33e-10 ***
SL:PYes -1.59311 0.11527 -13.820 < 2e-16 ***
SLM:PYes -1.17298 0.09803 -11.965 < 2e-16 ***
SUM:PYes -0.85460 0.09259 -9.230 < 2e-16 ***
EL:PYes -2.68292 0.09867 -27.191 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 3211.0014 on 15 degrees of freedom
Residual deviance: 1.5755 on 3 degrees of freedom
AIC: 141.39
################ Original data ############################
#data from Wisconsin that classifies 4991 high school seniors
according to
socio-economic status S= (low, lower middle, upper middle, and high),
# the degree of parental encouragement they receive E= (low and high)
# and whether or not they have plans to attend college P(no, yes).
#s= social stratum, E=parental encouragement P= college plans
#S= social stratum, E=parental encouragement P= college plans
S=c("L", "L", "LM", "LM", "UM", "UM", "H", "H")
S=c(S,S)
E = rep ( c("L", "H"), 8)
P= c (rep("No", 8), rep("Yes",8))
counts = c(749, 233, 627, 330, 420, 374, 153, 266,
35,133,38,303,37,467,26,800)
wisconsin = data.frame(S, E, P, counts)
wisconsin
S E P counts
1 L L No 749
2 L H No 233
3 LM L No 627
4 LM H No 330
5 UM L No 420
6 UM H No 374
7 H L No 153
8 H H No 266
9 L L Yes 35
10 L H Yes 133
11 LM L Yes 38
12 LM H Yes 303
13 UM L Yes 37
14 UM H Yes 467
15 H L Yes 26
16 H H Yes 800
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.