On 30/12/15 02:33, Frank van Berkum wrote:
Hi all, My problem is the following. Suppose I have a dataset with
observations Y and explanatory variables X1, ..., Xn, and suppose one
of these explanatory variables is geographical area (of which there
are ten, j=1,...,10). For some observations I know the area, but for
others it is unknown and therefore record as NA. I want to estimate a
model of the form Y[i] ~ Poisson( lambda[i] ) with log(lambda[i]) =
constant + \sum_j I[!is.na(area[i])] * I[area[i]==j] * beta[j] In
words: we estimate a constant for all observations and a factor for
each area. If it is unknown what the area is, we only include the
constant. When estimating this model using glm(), the records with
is.na(area[i]) are 'deleted' from the dataset, and this I do not
want. I had hoped that the model as described above could be
estimated using the function I() (interpret as), but so far my
attempts have not succeeded. Any help on how to approach this is
kindly appreciated.
As Deep Thought was heard to remark: "Hmmmmm. Tricky."
After pondering for a while it seems to me that you want to make NA the
reference level of the factor "geographical area".
I.e. convert the NA values to a level of that factor and make it the
*first* level.
E.g.:
set.seed(42)
GA <- factor(sample(LETTERS[1:10],200,TRUE))
GA[sample(1:200,10)] <- NA
tmp <- as.character(GA)
tmp[is.na(tmp)] <- "unknown"
GA <- factor(tmp,levels=c("unknown",LETTERS[1:10]))
y <- rpois(200,20) # Artificial response.
fit <- glm(y ~ GA,family=poisson)
Note that if you set
z <- predict(fit)
(this gives predictions on the scale of the linear predictor)
then the entries of z corresponding to "unknown" are equal to the
intercept coefficient of fit.
I can't quite get my head around whether this is *really* accomplishing
what you want --- but it's a thought.
cheers,
Rolf Turner
--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.