With R it is always possible to shoot yourself squarely in the foot, as you seem keen to do, but R does at least often make it difficult.
When you predict, you need to have values for ALL variables used in the model. Just leaving out the coefficients corresponding to absent predictors is equivalent to assuming that those coefficients are zero, and there is no basis whatever for so assuming. (In this constructed example things are different because the missing variable is a nonsense variable and the coefficient should be roughly zero, as it is, but in general that is not going to be the case.) So you need to supply some value for each of the missing predictors if you are going to use the standard prediction tools. An obvious plug is the mean of that variable in the training data, though more sophisticated alternatives would often be available. Here is a suggestion for your case. ## fit some linear model to random data x <- matrix(rnorm(100*3),100,3) y <- sample(1:2, 100, replace = TRUE) mydata <- data.frame(y, x) library(splines) ## missing from your code. mymodel <- lm(y ~ ns(X1, df = 3) + X2 + X3, data = mydata) summary(mymodel) ## create new data with 1 missing input mynewdata <- within(data.frame(matrix(rnorm(100*2), 100, 2)), ## add in an X3 X3 <- mean(mydata$X3)) mypred <- predict(mymodel, mynewdata) ________________________________________ From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Axel Urbiz [axel.ur...@gmail.com] Sent: 12 February 2011 11:51 To: R-help@r-project.org Subject: [R] Predictions with missing inputs Dear users, I'll appreciate your help with this (hopefully) simple problem. I have a model object which was fitted to inputs X1, X2, X3. Now, I'd like to use this object to make predictions on a new data set where only X1 and X2 are available (just use the estimated coefficients for these variables in making predictions and ignoring the coefficient on X3). Here's my attempt but, of course, didn't work. #fit some linear model to random data x=matrix(rnorm(100*3),100,3) y=sample(1:2,100,replace=TRUE) mydata <- data.frame(y,x) mymodel <- lm(y ~ ns(X1, df=3) + X2 + X3, data=mydata) summary(mymodel) #create new data with 1 missing input mynewdata <- data.frame(matrix(rnorm(100*2),100,2)) mypred <- predict(mymodel, mynewdata) Thanks in advance for your help! Axel. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.