Hello, I have tried reading the documentation and googling for the answer but reviewing the online matches I end up more confused than before.
My problem is apparently simple. I fit a glm model (2^k experiment), and then I would like to predict the response variable (Throughput) for unseen factor levels. When I try to predict I get the following error: > throughput.pred <- predict(throughput.fit,experiments,type="response") Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor 'No_databases' has new level(s) 200, 400, 600, 800, 1000 Of course these are new factor levels, it is exactly what I am trying to achieve i.e. extrapolate the values of Throughput. Can anyone please advice? Below I include all details. Thanks in advance, Best regards, Giovanni > # define the extreme (factors and levels) > experiments <- expand.grid(No_databases = seq(1000,100,by=-200), + Partitioning = c("sharding", "replication"), + No_middlewares = seq(500,100,by=-100), + Queue_size = c(100)) > experiments$No_databases <- as.factor(experiments$No_databases) > experiments$Partitioning <- as.factor(experiments$Partitioning) > experiments$No_middlewares <- as.factor(experiments$No_middlewares) > experiments$Queue_size <- as.factor(experiments$Queue_size) > > str(experiments) 'data.frame': 50 obs. of 4 variables: $ No_databases : Factor w/ 5 levels "200","400","600",..: 5 4 3 2 1 5 4 3 2 1 ... $ Partitioning : Factor w/ 2 levels "sharding","replication": 1 1 1 1 1 2 2 2 2 2 ... $ No_middlewares: Factor w/ 5 levels "100","200","300",..: 5 5 5 5 5 5 5 5 5 5 ... $ Queue_size : Factor w/ 1 level "100": 1 1 1 1 1 1 1 1 1 1 ... - attr(*, "out.attrs")=List of 2 ..$ dim : Named int 5 2 5 1 .. ..- attr(*, "names")= chr "No_databases" "Partitioning" "No_middlewares" "Queue_size" ..$ dimnames:List of 4 .. ..$ No_databases : chr "No_databases=1000" "No_databases= 800" "No_databases= 600" "No_databases= 400" ... .. ..$ Partitioning : chr "Partitioning=sharding" "Partitioning=replication" .. ..$ No_middlewares: chr "No_middlewares=500" "No_middlewares=400" "No_middlewares=300" "No_middlewares=200" ... .. ..$ Queue_size : chr "Queue_size=100" > head(experiments) No_databases Partitioning No_middlewares Queue_size 1 1000 sharding 500 100 2 800 sharding 500 100 3 600 sharding 500 100 4 400 sharding 500 100 5 200 sharding 500 100 6 1000 replication 500 100 > # or > throughput.fit <- > glm(log(Throughput)~(No_databases*No_middlewares)+Partitioning+Queue_size, + data=throughput) > summary(throughput.fit) Call: glm(formula = log(Throughput) ~ (No_databases * No_middlewares) + Partitioning + Queue_size, data = throughput) Deviance Residuals: Min 1Q Median 3Q Max -2.5966 -0.6612 -0.1944 0.5548 3.2136 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.74701 0.09127 62.970 < 2e-16 *** No_databases4 0.43309 0.10985 3.943 8.66e-05 *** No_middlewares2 -1.99374 0.11035 -18.067 < 2e-16 *** No_middlewares4 -1.23004 0.10969 -11.214 < 2e-16 *** Partitioningreplication 0.33291 0.06181 5.386 9.15e-08 *** Queue_size100 0.15850 0.06181 2.564 0.0105 * No_databases4:No_middlewares2 2.71525 0.15262 17.791 < 2e-16 *** No_databases4:No_middlewares4 1.94191 0.15226 12.754 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for gaussian family taken to be 0.8921778) Null deviance: 2175.58 on 936 degrees of freedom Residual deviance: 828.83 on 929 degrees of freedom AIC: 2562.2 Number of Fisher Scoring iterations: 2 > throughput.pred <- predict(throughput.fit,experiments,type="response") Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor 'No_databases' has new level(s) 200, 400, 600, 800, 1000 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.