Hi Hadley, This is related to how the terms are constructed. If you look at terms(m1) versus terms(m2), you see that in the case of m1 the knots are added to the attribute predvars. Contrary, when using splines::ns() this doesn't happen. Compare:
mf <- model.frame(claim ~ ns(wind, df = 5), data = dat) mt <- terms(mf) with mf2 <- model.frame(claim ~ splines::ns(wind, df = 5), data = dat) mt2 <- terms(mf) The culprit is actually in makepredictcall.ns, specifically: if (as.character(call)[1L] != "ns") return(call) Obviously the call starting with predict::ns() will cause the function to return the call unaltered. In the case of ns() it processes the knot information. And that difference causes the difference in predictions. So it is a bug imho. Especially since I don't understand the logic. The variable has class 'ns', so makepredictcall() dispatches correctly. That extra check in the function makepredictcall.ns seems unnecessary to me and might be simply removed. Cheers Joris On Fri, Apr 27, 2018 at 3:25 PM, Hadley Wickham <h.wick...@gmail.com> wrote: > Hi all, > > Very surprising (to me!) and mystifying result from predict.glm(): the > predictions vary depending on whether or not I use ns() or > splines::ns(). Reprex follows: > > library(splines) > > set.seed(12345) > dat <- data.frame(claim = rbinom(1000, 1, 0.5)) > mns <- c(3.4, 3.6) > sds <- c(0.24, 0.35) > dat$wind <- exp(rnorm(nrow(dat), mean = mns[dat$claim + 1], sd = > sds[dat$claim + 1])) > dat <- dat[order(dat$wind), ] > > m1 <- glm(claim ~ ns(wind, df = 5), data = dat, family = binomial) > m2 <- glm(claim ~ splines::ns(wind, df = 5), data = dat, family = binomial) > > # The model coefficients are the same > unname(coef(m1)) > #> [1] 0.5194712 -0.8687737 -0.6803954 4.0838947 2.3908674 4.1564128 > unname(coef(m2)) > #> [1] 0.5194712 -0.8687737 -0.6803954 4.0838947 2.3908674 4.1564128 > > # But the predictions are not! > newdf <- data.frame(wind = seq(min(dat$wind), max(dat$wind), length = 5)) > unname(predict(m1, newdata = newdf)) > #> [1] 0.51947119 0.03208719 2.82548847 3.90883496 4.06743266 > unname(predict(m2, newdata = newdf)) > #> [1] 0.5194712 -0.5666554 -0.1731268 2.8134844 3.9295814 > > Is this a bug? > > (Motivating issue from this ggplot2 bug report: > https://github.com/tidyverse/ggplot2/issues/2426) > > Thanks! > > Hadley > > > > -- > http://hadley.nz > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Joris Meys Statistical consultant Department of Data Analysis and Mathematical Modelling Ghent University Coupure Links 653, B-9000 Gent (Belgium) <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g> ----------- Biowiskundedagen 2017-2018 http://www.biowiskundedagen.ugent.be/ ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel