On Mon, 2011-05-23 at 09:58 -0400, Ravi Varadhan wrote: > Why does this not find a better solution? > > > x <- seq(0,2*pi, length=1000) > > x <- cbind(x/(2*pi), sin(x)) > > > fit1 <- principal.curve(x, plot = TRUE, trace = TRUE, maxit = 100, > + start = cbind(sort(x[,1]), rep(1, nrow(x)))) > Starting curve---distance^2: 1499.5 > Iteration 1---distance^2: 3.114789 > Iteration 2---distance^2: 10.04492 > Iteration 3---distance^2: 11.89215 > Iteration 4---distance^2: 12.43235 > Iteration 5---distance^2: 12.68524 > Iteration 6---distance^2: 12.84443 > Iteration 7---distance^2: 12.93624 > Iteration 8---distance^2: 12.99118 > Iteration 9---distance^2: 13.01280 > Iteration 10---distance^2: 13.02867 > Iteration 11---distance^2: 13.03867 > > > > You see that the projection distance is minimal at iteration 1, but > the algorithm settles for an inferior projection (i..e. a greater > projection distance).
I hadn't quite noticed that - I was too busy watching the algorithm converge to some solution instead. Hastie and Stuetzle make no claims about the algorithm converging or indeed even that each step leads to a decrease in the criterion (sum of squared distances to the curve). It would appear that a smoother curve than the one identified in iter 1 is preferable when one considers the self-consistency check; smoothing in each dimension using the distance along the current curve as a predictor performs local averaging and the new principal curve identified starts cutting the corners off the sine wave. gradually the new curve derived by local averaging and the current curve are sufficiently similar that the curve is declared self-consistent and the algorithm stops. I can achieve a better fit using my own wrappers around principal.curve() and the smooth.spline() function used in the local averaging, but only if I supply some good starting values. For my wrapper and the starting configuration I suggested, I get a very close solution to the sine wave original but it is a bit smoother - the rapid change around the peak/troughs is smoothed out a little bit. GCV is being used at each iteration in my wrappers to find the optimal smoothness for smooths in both dimensions that describe the curve; the solution fits the data almost as well as the best curve, but uses fewer degrees of freedom in doing so. As Henrik suggests, playing with the parameters of the smooth.spline fitting engine helps - as that is the main thing my wrapper function makes easier. These data seem quite difficult for the algorithm to get good answers - starting from the first or second principal component gives poor results and a highly complex fitted curve that traverses back and forth across the gaps between the peaks/troughs in the sine wave several times. G > If I do not provide any starting values, I get this: > > > fit1 <- principal.curve(x, plot = TRUE, trace = TRUE, maxit = 100) > Starting curve---distance^2: 29692.03 > Iteration 1---distance^2: 20.31220 > Iteration 2---distance^2: 19.45939 > Iteration 3---distance^2: 19.26387 > Iteration 4---distance^2: 19.20626 > Iteration 5---distance^2: 19.18666 > Iteration 6---distance^2: 19.18059 > > > > This is even worse. It seems like the algorithm is quite sensitive to > starting values. Is this behavior expected or is there some flaw in the > algorithm? > > Ravi. > > ________________________________________ > From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf > Of Gavin Simpson [gavin.simp...@ucl.ac.uk] > Sent: Monday, May 23, 2011 8:27 AM > To: guy33 > Cc: r-help@r-project.org > Subject: Re: [R] Problem with Princurve > > On Thu, 2011-05-19 at 06:43 -0700, guy33 wrote: > > Hey all, > > > > I can't seem to get the princurve package to produce correct results, even > > in the simplest cases. For example, if you just generate a 1 period > > noiseless sine wave, and ask for the principal curve and plot, the returned > > curve is clearly wrong (doesn't follow the sine wave). Here's my code: > > > > library(princurve) > > x <- runif(1000,0,2*pi); x <- cbind(x/(2*pi), sin(x)) > > fit1 <- principal.curve(x, plot = TRUE) > > > > Anyone have any suggestions? If you run this code, do you get the correct > > principal curve? > > How about specifying some useful starting points? > > fit1 <- principal.curve(x, plot = TRUE, trace = TRUE, maxit = 100, > start = cbind(sort(x[,1]), rep(1, nrow(x)))) > > And we need a few more iterations before convergence here. Starting from > the first principal component for example might give useful starting > points. > > HTH > > G > > > Any help would be really appreciated! > > -guy33 > > > > -- > > View this message in context: > > http://r.789695.n4.nabble.com/Problem-with-Princurve-tp3535721p3535721.html > > Sent from the R help mailing list archive at Nabble.com. > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > Dr. Gavin Simpson [t] +44 (0)20 7679 0522 > ECRC, UCL Geography, [f] +44 (0)20 7679 0565 > Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk > Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ > UK. WC1E 6BT. [w] http://www.freshwaters.org.uk > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.