Re: [R] GLM: What is a good way for dealing with new factor levels in the test set?

thuksu Mon, 04 May 2015 11:21:46 -0700

For anyone who is looking for an answer to this in the future...

I went for "imputation".  It's a way of filling in missing variables based
off of what you see elsewhere in the data.


Myself, I simply took a sample of the categorical from the rest of the test
set.  Some may argue that this is erroneous, as I simply don't know anything
about the new categorical in the test set, and I should throw it away. 
However, my results are going to be aggregated later, and this lets me do
some central limit theorem hand waving.



--
View this message in context: 
http://r.789695.n4.nabble.com/GLM-What-is-a-good-way-for-dealing-with-new-factor-levels-in-the-test-set-tp4706621p4706772.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] GLM: What is a good way for dealing with new factor levels in the test set?

Reply via email to