On Jul 2, 2012, at 4:13 PM, YTP wrote:
I am not sure that method works. It appears to be doing something close, but
returns too many slope coefficients, since I think it is returning
interaction terms of degree smaller and greater than what was passed to it.
No. It _was_ doing what you asked for. I think you don't understand  
the expansion.
Here is a small example of degree 2:

X = data.frame(cbind(c(1,2,3), c(4,5,6)))
X
 X1 X2
1  1  4
2  2  5
3  3  6

y = c(0,1,0)

mymodel2 = glm(y ~ poly(X$X1,2)*poly(X$X2,2),
family=binomial(link="logit"))

# We see slope coefficients returned for interaction terms of degree 1, 3,
and 4.
Degree 3 and 4 ???  That wasn't how I would have described the results  
(there being only first and second order terms as determined by the  
'degree' argument), but the fact remains.... You cannot estimate more  
than three parameters with a dataset of only 3 points. That is basic  
math.
It seems you have demonstrated that these weapons are "too sharp" to  
be wielded safely in your hands, so maybe you should step back a few  
paces and re-consider what you are really trying to accomplish. Is the  
goal really curve fitting with with N^2 polynomial parameters. And do  
you _really_ want to be describing to your audience the interpretation  
of models created with a logit link that have high degree polynomial  
terms? Or are you instead interested in flexible regression for  
purposes of description or exploratory analyses such as provided by  
loess (one form of local regression) or perhaps GAM's with smoothing  
terms?
test = data.frame(y = rnorm(1000),X1=rnorm(1000), X2=rnorm(1000) )
plot.new()
persp( x= seq(-2, 2,by=0.1),  y= seq(-2, 2,by=0.1),
       z= predict(mymodel2,
newdata =expand.grid(X1=seq(-2, 2,by=0.1), X2=seq(-2, 2,by=0.1)) ) ,
       ticktype="detailed")

This has the advantage that the polynomial basis is hidden and you only end up looking at the predictions. I also like working with Frank Harrell's 'rms' package because his perimeter function restricts plotted estimates to regions with sufficient data and the restricted cubic splines have less tendency to blow-up near the boundaries of hte data.

summary(mymodel2)
                               Estimate Std. Error z value Pr(>|z|)
(Intercept)                   -7.855e+00  4.588e+04       0        1
poly(X$X1, 2)1                 6.059e-15  7.946e+04       0        1
poly(X$X1, 2)2                -3.848e+01  7.946e+04       0        1
poly(X$X2, 2)1                        NA         NA      NA       NA
poly(X$X2, 2)2                        NA         NA      NA       NA
poly(X$X1, 2)1:poly(X$X2, 2)1         NA         NA      NA       NA
poly(X$X1, 2)2:poly(X$X2, 2)1         NA         NA      NA       NA
poly(X$X1, 2)1:poly(X$X2, 2)2         NA         NA      NA       NA
poly(X$X1, 2)2:poly(X$X2, 2)2         NA         NA      NA       NA





# Data used in the model

mymodel2$model
 y poly(X$X1, 2).1 poly(X$X1, 2).2 poly(X$X2, 2).1 poly(X$X2, 2).2
1 0   -7.071068e-01    4.082483e-01   -7.071068e-01    4.082483e-01
2 1    4.350720e-18   -8.164966e-01    4.350720e-18   -8.164966e-01
3 0    7.071068e-01    4.082483e-01    7.071068e-01    4.082483e-01


Orthogonal basis.


where instead I would expect the data to be like

1  4   16
4  10  25
9  18  36
?poly

You can get something like that with poly(X1, degree=2, raw=TRUE)

> poly(1:3, degree=2, raw=TRUE)
     1 2
[1,] 1 1
[2,] 2 4
[3,] 3 9
attr(,"degree")
[1] 1 2
attr(,"class")
[1] "poly"   "matrix"


--

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to