On Mar 22, 2010, at 2:00 PM, rkevinbur...@charter.net wrote: > Thanks to Marc Schultz I found the documentation on the "factors" attribute > under ?term.object. It stats:
<cough> ;-) > factors: A matrix of variables by terms showing which variables appear > in which terms. The entries are 0 if the variable does not > occur in the term, 1 if it does occur and should be coded by > contrasts, and 2 if it occurs and should be coded via dummy > variables for all levels (as when an intercept or lower-order > term is missing). If there are no terms other than an > intercept and offsets, this is ‘numeric(0)’. The key is 'dummy variables for *all* levels'. In other words your example below of 12 months, would be represented by 12 individual binary (0/1) encodings, rather than, for example using default treatment contrasts, 11 individual binary (0/1) encodings, where the base or reference level is not included in the resultant model matrix. I have not spent a lot of time on this internal R/S model design point, but in rather simple cases as an example, a '2' will appear in the presence of interaction terms lacking the main effects term for the second factor: > attr(terms(y ~ x + z), "factors") x z y 0 0 x 1 0 z 0 1 > attr(terms(y ~ x + x:z), "factors") x x:z y 0 0 x 1 2 z 0 1 Compare the second example above with the more common: > attr(terms(y ~ x * z), "factors") x z x:z y 0 0 0 x 1 0 1 z 0 1 1 which is of course equivalent to: > attr(terms(y ~ x + z + x:z), "factors") x z x:z y 0 0 0 x 1 0 1 z 0 1 1 The difference in the encodings will be reflected in the model matrix. See ?model.matrix and play around with the examples there, including adding interaction terms. For example, model.matrix( ~ a + a:b, dd), etc. This discussion leads into the complex issue of the internal representation of R (and S) models. If you really want to dig deeper, then you should get a copy of "Statistical Models in S" by Chambers and Hastie 1993 (aka "The White Book") and specifically note the rule described on the bottom of page 38 therein, perhaps pre-reading the entire chapter leading up to that particular point. HTH, Marc > So now this brings up another question. It seems that the attriute is a two > dimentional array. When I print it out in 'R' > > Fitting the formula prestige ~ income + education I get: > > income education > prestige 0 0 > income 1 0 > education 0 1 > > This matrix says to me that 'income' occurs in the term 'income' etc. So it > seems that this matrix will always be a diagonal matrix with an added row of > zeros containing the response term. If the formula is such that the response > is a function of one or more of the dependent variables then of course it > will be something other that a row of zeros. So far OK? > > My problem in understanding comes with using a formula that contains R > factors. I am using the following (from the TSA package) for an example: > > l <- lm(tempdub ~ season(tempdub)) > attr(l$terms, "factors") > > season(tempdub) > tempdub 0 > season(tempdub) 1 > > The function 'season' produces a factor (in this case with 12 levels, one for > each month). But the factor attribute still has a '1' and not a '2' > indicating that the variable should be coded as a dummy variable (factor). > > Please help my misunderstanding. > > Thank you. > > Kevin Burton ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.