Dear Bert, OK and thank you.
@Fox, John <j...@mcmaster.ca> will be grateful for an offline reply. Best, Ashim On Thu, Nov 28, 2019 at 11:43 AM Bert Gunter <bgunter.4...@gmail.com> wrote: > Statistical questions are generally off topic on this list. Try > stats.stackexchange.com instead. > > But FWIW, I recommend that you work with someone with expertise in time > series analysis, as your efforts to shake and bake your own methodology > seem rather unwise to me. > > Cheers, > Bert > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Wed, Nov 27, 2019 at 9:47 PM Ashim Kapoor <ashimkap...@gmail.com> > wrote: > >> Dear Peter and John, >> >> Many thanks for your prompt replies. >> >> Here is what I was trying to do. I was trying to build a statistical >> model >> of a given time series using Box Jenkins methodology. The series has 93 >> data points. Before I analyse the ACF and PACF, I am required to de-trend >> the series. The series seems to have an upward trend. I wanted to find out >> what order polynomial should I fit the series >> without overfitting. For this I want to use orthogonal polynomials(I >> think >> someone on the internet was talking about preventing overfitting by using >> orthogonal polynomials) . This seems to me as a poor man's cross >> validation. >> >> So my plan is to keep increasing the degree of the orthogonal polynomials >> till the coefficient of the last orthogonal polynomial becomes >> insignificant. >> >> Note : If I do NOT use orthogonal polynomials, I will overfit the data set >> and I don't think that is a good way to detect the true order of the >> polynomial. >> >> Also now that I have detrended the series and built an ARIMA model of the >> residuals, now I want to forecast. For this I need to use the original >> polynomials and their coefficients. >> >> I hope I was clear and that my methodology is ok. >> >> I have another query here :- >> >> Note : If I used cross-validation to determine the order of the >> polynomial, >> I don't get a clear answer. >> >> See here :- >> library(boot) >> mydf = data.frame(cbind(gdp,x)) >> d<-(c( >> cv.glm(data = mydf,glm(gdp~x),K=10)$delta[1], >> cv.glm(data = mydf,glm(gdp~poly(x,2)),K=10)$delta[1], >> cv.glm(data = mydf,glm(gdp~poly(x,3)),K=10)$delta[1], >> cv.glm(data = mydf,glm(gdp~poly(x,4)),K=10)$delta[1], >> cv.glm(data = mydf,glm(gdp~poly(x,5)),K=10)$delta[1], >> cv.glm(data = mydf,glm(gdp~poly(x,6)),K=10)$delta[1])) >> print(d) >> ## [1] 2.178574e+13 7.303031e+11 5.994783e+11 4.943586e+11 4.596648e+11 >> ## [6] 4.980159e+11 >> >> # Here it chooses 5. (but 4 and 5 are kind of similar). >> >> >> d1 <- (c( >> cv.glm(data = mydf,glm(gdp~1+x),K=10)$delta[1], >> cv.glm(data = mydf,glm(gdp~1+x+x^2),K=10)$delta[1], >> cv.glm(data = mydf,glm(gdp~1+x+x^2+x^3),K=10)$delta[1], >> cv.glm(data = mydf,glm(gdp~1+x+x^2+x^3+x^4),K=10)$delta[1], >> cv.glm(data = mydf,glm(gdp~1+x+x^2+x^3+x^4+x^5),K=10)$delta[1], >> cv.glm(data = mydf,glm(gdp~1+x+x^2+x^3+x^4+x^5+x^6),K=10)$delta[1])) >> >> print(d1) >> ## [1] 2.149647e+13 2.253999e+13 2.182175e+13 2.177170e+13 2.198675e+13 >> ## [6] 2.145754e+13 >> >> # here it chooses 1 or 6 >> >> Query : Why does it choose 1? Notice : Is this just round off noise / >> noise >> due to sampling error created by Cross Validation when it creates the K >> folds? Is this due to the ill conditioned model matrix? >> >> Best Regards, >> Ashim. >> >> >> >> >> >> On Wed, Nov 27, 2019 at 10:41 PM Fox, John <j...@mcmaster.ca> wrote: >> >> > Dear Ashim, >> > >> > Orthogonal polynomials are used because they tend to produce more >> accurate >> > numerical computations, not because their coefficients are >> interpretable, >> > so I wonder why you're interested in the coefficients. >> > >> > The regressors produced are orthogonal to the constant regressor and are >> > orthogonal to each other (and in fact are orthonormal), as it's simple >> to >> > demonstrate: >> > >> > ------- snip ------- >> > >> > > x <- 1:93 >> > > y <- 1 + x + x^2 + x^3 + x^4 + rnorm(93) >> > > (m <- lm(y ~ poly(x, 4))) >> > >> > Call: >> > lm(formula = y ~ poly(x, 4)) >> > >> > Coefficients: >> > (Intercept) poly(x, 4)1 poly(x, 4)2 poly(x, 4)3 poly(x, 4)4 >> > 15574516 172715069 94769949 27683528 3429259 >> > >> > > X <- model.matrix(m) >> > > head(X) >> > (Intercept) poly(x, 4)1 poly(x, 4)2 poly(x, 4)3 poly(x, 4)4 >> > 1 1 -0.1776843 0.2245083 -0.2572066 0.27935949 >> > 2 1 -0.1738216 0.2098665 -0.2236579 0.21862917 >> > 3 1 -0.1699589 0.1955464 -0.1919525 0.16390514 >> > 4 1 -0.1660962 0.1815482 -0.1620496 0.11487597 >> > 5 1 -0.1622335 0.1678717 -0.1339080 0.07123722 >> > 6 1 -0.1583708 0.1545171 -0.1074869 0.03269145 >> > >> > > zapsmall(crossprod(X))# X'X >> > (Intercept) poly(x, 4)1 poly(x, 4)2 poly(x, 4)3 poly(x, 4)4 >> > (Intercept) 93 0 0 0 0 >> > poly(x, 4)1 0 1 0 0 0 >> > poly(x, 4)2 0 0 1 0 0 >> > poly(x, 4)3 0 0 0 1 0 >> > poly(x, 4)4 0 0 0 0 1 >> > >> > ------- snip ------- >> > >> > If for some not immediately obvious reason you're interested in the >> > regression coefficients, why not just use a "raw" polynomial: >> > >> > ------- snip ------- >> > >> > > (m1 <- lm(y ~ poly(x, 4, raw=TRUE))) >> > >> > Call: >> > lm(formula = y ~ poly(x, 4, raw = TRUE)) >> > >> > Coefficients: >> > (Intercept) poly(x, 4, raw = TRUE)1 poly(x, 4, raw = >> TRUE)2 >> > poly(x, 4, raw = TRUE)3 >> > 1.5640 0.8985 >> 1.0037 >> > 1.0000 >> > poly(x, 4, raw = TRUE)4 >> > 1.0000 >> > >> > ------- snip ------- >> > >> > These coefficients are simply interpretable but the model matrix is more >> > poorly conditioned: >> > >> > ------- snip ------- >> > >> > > head(X1) >> > (Intercept) poly(x, 4, raw = TRUE)1 poly(x, 4, raw = TRUE)2 poly(x, 4, >> > raw = TRUE)3 >> > 1 1 1 1 >> > 1 >> > 2 1 2 4 >> > 8 >> > 3 1 3 9 >> > 27 >> > 4 1 4 16 >> > 64 >> > 5 1 5 25 >> > 125 >> > 6 1 6 36 >> > 216 >> > poly(x, 4, raw = TRUE)4 >> > 1 1 >> > 2 16 >> > 3 81 >> > 4 256 >> > 5 625 >> > 6 1296 >> > > round(cor(X1[, -1]), 2) >> > poly(x, 4, raw = TRUE)1 poly(x, 4, raw = TRUE)2 >> > poly(x, 4, raw = TRUE)3 >> > poly(x, 4, raw = TRUE)1 1.00 0.97 >> > 0.92 >> > poly(x, 4, raw = TRUE)2 0.97 1.00 >> > 0.99 >> > poly(x, 4, raw = TRUE)3 0.92 0.99 >> > 1.00 >> > poly(x, 4, raw = TRUE)4 0.87 0.96 >> > 0.99 >> > poly(x, 4, raw = TRUE)4 >> > poly(x, 4, raw = TRUE)1 0.87 >> > poly(x, 4, raw = TRUE)2 0.96 >> > poly(x, 4, raw = TRUE)3 0.99 >> > poly(x, 4, raw = TRUE)4 1.00 >> > >> > ------- snip ------- >> > >> > The two parametrizations are equivalent, however, in that they represent >> > the same regression surface, and so, e.g., produce the same fitted >> values: >> > >> > ------- snip ------- >> > >> > > all.equal(fitted(m), fitted(m1)) >> > [1] TRUE >> > >> > ------- snip ------- >> > >> > Because one is usually not interested in the individual coefficients of >> a >> > polynomial there usually isn't a reason to prefer one parametrization to >> > the other on the grounds of interpretability, so why do you need to >> > interpret the regression equation? >> > >> > I hope this helps, >> > John >> > >> > ----------------------------- >> > John Fox, Professor Emeritus >> > McMaster University >> > Hamilton, Ontario, Canada >> > Web: http::/socserv.mcmaster.ca/jfox >> > >> > > On Nov 27, 2019, at 10:17 AM, Ashim Kapoor <ashimkap...@gmail.com> >> > wrote: >> > > >> > > Dear Petr, >> > > >> > > Many thanks for the quick response. >> > > >> > > I also read this:- >> > > https://en.wikipedia.org/wiki/Discrete_orthogonal_polynomials >> > > >> > > Also I read in ?poly:- >> > > The orthogonal polynomial is summarized by the coefficients, which >> > > can be used to evaluate it via the three-term recursion given in >> > > Kennedy & Gentle (1980, pp. 343-4), and used in the ‘predict’ part >> > > of the code. >> > > >> > > I don't have access to the mentioned book. >> > > >> > > Out of curiosity, what is the name of the discrete orthogonal >> polynomial >> > > used by R ? >> > > What discrete measure is it orthogonal with respect to ? >> > > >> > > Many thanks, >> > > Ashim >> > > >> > > >> > > >> > > >> > > On Wed, Nov 27, 2019 at 6:11 PM PIKAL Petr <petr.pi...@precheza.cz> >> > wrote: >> > > >> > >> You could get answer quickly by searching net. >> > >> >> > >> >> > >> >> > >> https://stackoverflow.com/questions/39031172/how-poly-generates-orthogonal-p >> > >> olynomials-how-to-understand-the-coefs-ret/39051154#39051154 >> > >> < >> > >> https://stackoverflow.com/questions/39031172/how-poly-generates-orthogonal-polynomials-how-to-understand-the-coefs-ret/39051154#39051154 >> > > >> > >> >> > >> Cheers >> > >> Petr >> > >> >> > >>> -----Original Message----- >> > >>> From: R-help <r-help-boun...@r-project.org> On Behalf Of Ashim >> Kapoor >> > >>> Sent: Wednesday, November 27, 2019 12:55 PM >> > >>> To: R Help <r-help@r-project.org> >> > >>> Subject: [R] Orthogonal polynomials used by R >> > >>> >> > >>> Dear All, >> > >>> >> > >>> I have created a time trend by doing x<-1:93 because I have a time >> > series >> > >>> with 93 data points. Next I did :- >> > >>> >> > >>> y = lm(series ~ poly(x,4))$residuals >> > >>> >> > >>> to detrend series. >> > >>> >> > >>> I choose this 4 as the order of my polynomial using cross >> validation/ >> > >>> checking the absence of trend in the residuals so I think I have not >> > >> overfit >> > >>> this series. >> > >>> >> > >>> I wish to document the formula of poly(x,4). I am not able to find >> it >> > in >> > >> ?poly >> > >>> >> > >>> Can someone please tell me what the formula for the orthogonal >> > >>> polynomial used by R is ? >> > >>> >> > >>> Thank you, >> > >>> Ashim >> > >>> >> > >>> [[alternative HTML version deleted]] >> > >>> >> > >>> ______________________________________________ >> > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > >>> https://stat.ethz.ch/mailman/listinfo/r-help >> > >>> PLEASE do read the posting guide http://www.R-project.org/posting- >> > >>> guide.html >> > >>> and provide commented, minimal, self-contained, reproducible code. >> > >> >> > > >> > > [[alternative HTML version deleted]] >> > > >> > > ______________________________________________ >> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > > https://stat.ethz.ch/mailman/listinfo/r-help >> > > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > > and provide commented, minimal, self-contained, reproducible code. >> > >> > >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.