Folks: I believe this discussion would be better moved to a statistical discussion forum, like stats.stackexchange.com ,as it appears to be all about statistical issues, not R. I do not understand how you can possibly expect to predict behavior in new categories for which you have no prior information, but perhaps I do not understand or there are appropriate ways to do this in your subject matter area that discussion on a statistical forum would uncover. If you find any, you might then come back to R (see CRAN's task views: http://cran.r-project.org/web/views/ or simply search using a search engine) to see whether/how such methodology is implemented in R.
Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Tue, Jan 13, 2015 at 8:59 AM, HelponR <suncert...@gmail.com> wrote: > Thanks for your reply. But I cannot control the data. > I am dealing with real world stream data. It is very normal that the test > data(when you apply model to do prediction) have new values that are not > seen in training data. > If I code myself, I would give a random guess or just an intercept for such > situation. But it seems most R package returns an error and exit. > > On Mon, Jan 12, 2015 at 6:08 PM, Richard M. Heiberger <r...@temple.edu> > wrote: > >> You need to define the levels of the training set to include all >> levels that you might see. >> Something like this >> >> > A <- factor(letters[1:5]) >> > B <- factor(letters[c(1,3,5,7,9)]) >> > A >> [1] a b c d e >> Levels: a b c d e >> > B >> [1] a c e g i >> Levels: a c e g i >> > training <- factor(A, levels=unique(c(levels(A), levels(B)))) >> > training >> [1] a b c d e >> Levels: a b c d e g i >> > >> >> In the future please "provide commented, minimal, self-contained, >> reproducible code." >> >> On Mon, Jan 12, 2015 at 9:00 PM, HelponR <suncert...@gmail.com> wrote: >> > It looks like gbm, glm all has this issue >> > >> > I wonder if any R package is immune of this? >> > >> > In reality, it is very normal that test data has data unseen in training >> > data. It looks like I have to give up R? >> > >> > Thanks! >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.