After fitting and pruning an rpart model, it is often the case that one or more of the original predictors is not used by any of the splits of the final tree. It seems logical, therefore, that values for these "unused" predictors would not be needed for prediction. But when predict() is called on such models, all predictors seem to be required. Why is that, and can it be easily circumvented?
Consider this example: > model <- rpart(Mileage ~ Weight + Disp. + HP, car.test.frame) > model n= 60 node), split, n, deviance, yval * denotes terminal node 1) root 60 1354.58300 24.58333 2) Disp.>=134 35 154.40000 21.40000 4) Weight>=3087.5 22 61.31818 20.40909 * 5) Weight< 3087.5 13 34.92308 23.07692 * 3) Disp.< 134 25 348.96000 29.04000 6) Disp.>=97.5 16 101.75000 27.12500 * 7) Disp.< 97.5 9 84.22222 32.44444 * > newdata <- data.frame(Disp.=car.test.frame$Disp., Weight=car.test.frame$Weight) > predict(model, newdata=newdata) Error in eval(expr, envir, enclos) : object 'HP' not found In this model, Disp. and Weight were used in splits, but HP was not. Thus I expected to be able to perform predictions by providing values for just Disp. and Weight, but predict() failed when I tried that, complaining that HP was not also provided. Thanks for any help you can provide. My apologies if I simply do not understand how this works. Best regards, Jason ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.