On Mar 22, 2010, at 2:00 PM, rkevinbur...@charter.net wrote:

> Thanks to Marc Schultz I found the documentation on the "factors" attribute 
> under ?term.object. It stats:

<cough>   ;-)

> factors: A matrix of variables by terms showing which variables appear
>          in which terms.  The entries are 0 if the variable does not
>          occur in the term, 1 if it does occur and should be coded by
>          contrasts, and 2 if it occurs and should be coded via dummy
>          variables for all levels (as when an intercept or lower-order
>          term is missing).  If there are no terms other than an
>          intercept and offsets, this is ‘numeric(0)’.


The key is 'dummy variables for *all* levels'. In other words your example 
below of 12 months, would be represented by 12 individual binary (0/1) 
encodings, rather than, for example using default treatment contrasts, 11 
individual binary (0/1) encodings, where the base or reference level is not 
included in the resultant model matrix.

I have not spent a lot of time on this internal R/S model design point, but in 
rather simple cases as an example, a '2' will appear in the presence of 
interaction terms lacking the main effects term for the second factor:

> attr(terms(y ~ x + z), "factors")
  x z
y 0 0
x 1 0
z 0 1

> attr(terms(y ~ x + x:z), "factors")
  x x:z
y 0   0
x 1   2
z 0   1


Compare the second example above with the more common:

> attr(terms(y ~ x * z), "factors")
  x z x:z
y 0 0   0
x 1 0   1
z 0 1   1

which is of course equivalent to:

> attr(terms(y ~ x + z + x:z), "factors")
  x z x:z
y 0 0   0
x 1 0   1
z 0 1   1


The difference in the encodings will be reflected in the model matrix. See 
?model.matrix and play around with the examples there, including adding 
interaction terms. For example, model.matrix( ~ a + a:b, dd), etc.

This discussion leads into the complex issue of the internal representation of 
R (and S) models. If you really want to dig deeper, then you should get a copy 
of "Statistical Models in S" by Chambers and Hastie 1993 (aka "The White Book") 
and specifically note the rule described on the bottom of page 38 therein, 
perhaps pre-reading the entire chapter leading up to that particular point.

HTH,

Marc


> So now this brings up another question. It seems that the attriute is a two 
> dimentional array. When I print it out in 'R' 
> 
> Fitting the formula prestige ~ income + education I get:
> 
>          income education
> prestige       0         0
> income         1         0
> education      0         1
> 
> This matrix says to me that 'income' occurs in the term 'income' etc. So it 
> seems that this matrix will always be a diagonal matrix with an added row of 
> zeros containing the response term. If the formula is such that the response 
> is a function of one or more of the dependent variables then of course it 
> will be something other that a row of zeros. So far OK?
> 
> My problem in understanding comes with using a formula that contains R 
> factors. I am using the following (from the TSA package)  for an example:
> 
> l <- lm(tempdub ~ season(tempdub))
> attr(l$terms, "factors")
> 
>                season(tempdub)
> tempdub                       0
> season(tempdub)               1
> 
> The function 'season' produces a factor (in this case with 12 levels, one for 
> each month). But the factor attribute still has a '1' and not a '2' 
> indicating that the variable should be coded as a dummy variable (factor).
> 
> Please help my misunderstanding.
> 
> Thank you.
> 
> Kevin Burton

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to