Re: [R] any r package can handle factor levels not in the test set

2015-01-13 Thread William Dunlap
I think it would be nice if predict methods returned NA in appropriate spots instead of aborting when a categorical predictor contains levels not found in the training set. It should not be that hard to implement, as the 'xlevels' component of the model is already being used to put factor levels i

Re: [R] any r package can handle factor levels not in the test set

2015-01-13 Thread Bert Gunter
Folks: I believe this discussion would be better moved to a statistical discussion forum, like stats.stackexchange.com ,as it appears to be all about statistical issues, not R. I do not understand how you can possibly expect to predict behavior in new categories for which you have no prior informa

Re: [R] any r package can handle factor levels not in the test set

2015-01-13 Thread HelponR
Thanks for your reply. But I cannot control the data. I am dealing with real world stream data. It is very normal that the test data(when you apply model to do prediction) have new values that are not seen in training data. If I code myself, I would give a random guess or just an intercept for such

Re: [R] any r package can handle factor levels not in the test set

2015-01-12 Thread Richard M. Heiberger
You need to define the levels of the training set to include all levels that you might see. Something like this > A <- factor(letters[1:5]) > B <- factor(letters[c(1,3,5,7,9)]) > A [1] a b c d e Levels: a b c d e > B [1] a c e g i Levels: a c e g i > training <- factor(A, levels=unique(c(levels(A)

[R] any r package can handle factor levels not in the test set

2015-01-12 Thread HelponR
It looks like gbm, glm all has this issue I wonder if any R package is immune of this? In reality, it is very normal that test data has data unseen in training data. It looks like I have to give up R? Thanks! [[alternative HTML version deleted]]