On Fri, 2011-05-06 at 11:20 -0500, Gene Leynes wrote: > Hmmm.... > > After reading that email four times, I think I see what you mean. > > Checking for variables within particular scopes is probably one of the most > challenging things in R, and I would guess in other languages too. In R > it's compounded by situations when you're writing a function to accept > variables as either a stand alone global variable, or as an element of a > data.frame or list.
Dear Gene, > I think this is a new problem, and I'll switch to the lengthier > data.frame[,'x'] syntax in gam for now. No, No, No, No, No!!!!! If there is one thing you **should** take from this thread is that there is no need to perform subsetting like that in a model formula. Why would you want (or prefer) to do: gam(dat$y ~ s(dat$x)) or gam(dat[, "y"] ~ s(dat[, "x"])) when gam(y ~ s(x), dat) will suffice? > By the way, about the $ accessors. I can see why some people don't like > them, but they are a part of the language. And, I think they're a good > part. They make the code much more readable, and I have yet to make a > mistake using them (which makes me think that they can be used > responsibly). Making code harder to read is its own source of error, and is > not something that I take lightly! If you use R's formula notation properly, you'll get cleaner code than either of your suggestions. > Thanks for the replies. And, thank you Rolf for the detailed analysis. Do > you think that your or I should submit a bug report to the package > maintainer? I'm not sure how that works. Very few of my challenges turn > out to be actual bugs, but I think this one is. I would consider this a bug - but in the sense that Simon didn't foresee the strange formulas that users of his software might concoct. By the way, I think you perhaps meant Berwin (re the detailed analysis)? HTH G > On Fri, May 6, 2011 at 4:53 AM, Berwin A Turlach > <berwin.turl...@gmail.com>wrote: > > > G'day Rolf, > > > > On Fri, 06 May 2011 09:58:50 +1200 > > Rolf Turner <rolf.tur...@xtra.co.nz> wrote: > > > > > but it's strange that the dodgey code throws an error with gam(dat1$y > > > ~ s(dat1$x)) but not with gam(dat2$cf ~ s(dat2$s)) > > > > > Something a bit subtle is going on; it would be nice to be able to > > > understand it. > > > > Well, > > > > R> traceback() > > 3: eval(expr, envir, enclos) > > 2: eval(inp, data, parent.frame()) > > 1: gam(dat$y ~ s(dat$x)) > > > > So the lines leading up to the problem seem to be the following from > > the gam() function: > > > > vars <- all.vars(gp$fake.formula[-2]) > > inp <- parse(text = paste("list(", paste(vars, collapse = ","), > > ")")) > > if (!is.list(data) && !is.data.frame(data)) > > data <- as.data.frame(data) > > > > > > > > Setting > > > > R> options(error=recover) > > > > running the code until the error occurs, and then examining the frame > > number for the gam() call shows that "inp" is > > "expression(list( dat1,x ))" in your first example and > > "expression(list( dat2,s ))" in your second example. In both > > examples, "data" is "list()" (not unsurprisingly). When, > > > > dl <- eval(inp, data, parent.frame()) > > > > is executed, it tries to eval "inp", in both cases "dat1" and "dat2" > > are found, obviously, in the parent frame. In your first example "x" is > > (typically) not found and an error is thrown, in your second example an > > object with name "s" is found in "package:mgcv" and the call to eval > > succeeds. "dl" becomes a list with two components, the first being, > > respectively, "dat1" or "dat2", and the second the body of the function > > "s". (To verify that, you should probably issue the command > > "debug(gam)" and step through those first few lines of the function > > until you reach the above command.) > > > > The corollary is that you can use the name of any object that R will > > find in the parent frame, if it is another data set, then that data > > set will become the second component of "inp". E.g.: > > > > R> dat=data.frame(min=1:100,cf=sin(1:100/50)+rnorm(100,0,.05)) > > R> gam(dat$cf ~ s(dat$min)) > > > > Family: gaussian > > Link function: identity > > > > Formula: > > dat$cf ~ s(dat$min) > > > > Estimated degrees of freedom: > > 3.8925 total = 4.892488 > > > > GCV score: 0.002704789 > > > > Or > > > > R> dat=data.frame(BOD=1:100,cf=sin(1:100/50)+rnorm(100,0,.05)) > > R> gam(dat$cf ~ s(dat$BOD)) > > > > Family: gaussian > > Link function: identity > > > > Formula: > > dat$cf ~ s(dat$BOD) > > > > Estimated degrees of freedom: > > 3.9393 total = 4.939297 > > > > GCV score: 0.002666985 > > > > > Just out of pure academic interest. :-) > > > > Hope your academic curiosity is now satisfied. :) > > > > HTH. > > > > Cheers, > > > > Berwin > > > > ========================== Full address ============================ > > A/Prof Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) > > School of Maths and Stats (M019) +61 (8) 6488 3383 (self) > > The University of Western Australia FAX : +61 (8) 6488 1028 > > 35 Stirling Highway > > Crawley WA 6009 e-mail: berwin.turl...@gmail.com > > Australia http://www.maths.uwa.edu.au/~berwin > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.