Dear R Help folks -- I have been trying to put together a list of the steps or stages of R function evaluation, with particular focus on those that have "standard" or "nonstandard" forms. This is both for my own edification and also because I am thinking of joining the world of R bloggers and have been trying to put together some draft posting that might be useful. I seem to have an affirmative genius for finding incorrect interpretations of R's evaluation rules; I'm trying to make that an asset.
I am hoping that you can tell me: 1. Is this list complete, or are there additional stages I am missing? 2. Have I inserted one or more imaginary stages? 3. Are the terms I use below to name each stage appropriate, or are there other terms more widely used or recognizable? 4. Is the order correct? I begin each name with “Standard,” to express my belief that each of these things has a usual or default form, but also that (unless I am mistaken) almost none of them exist only in a single form true of all R functions. (I have marked with an asterisk a few evaluation steps that I think may always be followed). It is my ultimate goal (which I do not feel at all close to accomplishing) to determine a way to mechanically test for “standardness” along each of these dimensions, so that each function could be assigned a logical vector showing the ways that it is and is not standard. One thing I think is conceptually or procedurally difficult about this project is that I think “standardness” should be determined by what a function does, rather than by how it does it, so that a primitive function that takes unevaluated arguments positionally could still have standard matching, scoping, etc., by internal emulation. A related goal is to identify which evaluation steps most often use an alternative form, and perhaps determine if there is more than one such alternative. Finally, an easier short-term goal is simply to find instances of one or more function with standard and non-standard evaluation for each evaluation step. For the most part below I am treating the evaluation of closures as the standard from which “nonstandard” is defined. However, I do not assume that other kinds of functions are automatically nonstandard on any particular dimension below. Most of this comes from the R Language Definition, but there are numerous places where I am by no means certain that my interpretation is correct. I have highlighted some of these below with a “??”. I look forward to learning from you. Warmest regards, J. Andrew Hoerner ** Standard function recognition:* recognizing some or all of a string code as a function. (Part of code line parsing) *Standard environment construction:* construction of the execution environment, and of pointers to the calling and enclosing environments. *Standard function identification:* Get the name of the function, if any ** Standard f**unction scoping*: Search the current environment and then up the chain of enclosing environments until you find the first binding environment, an environment where the name of the function is bound, i.e. linked to a pointer to the function definition. The binding environment is usually (but not always) the same as the defining environment (i.e. the enclosing environment when the function is defined. Note that function lookup, unlike function argument lookup, necessarily starts from the calling environment, because a function does not know what it is – its formals, body, and environments – until it is found. Named functions are always found by scoping. R never learns "where" they are -- they have to be looked up each time. For this reason, anonymous functions must be used in place, and called by a function that takes a function as an argument, or by (function(formals){body})(actual args) *Standard f**unction **retrieval**:* load (??) the function, i.e. transfer the list (??) of formals and defaults and the list (??) of expressions that constitute the function body into the execution environment Note that the function body is parsed at the time the function is created (true?? Or is it parsed every time the function is called?) *Standard argument matching*: assignment of expressions and default arguments to formals via the usually-stated matching rulesIf matched positionall at call time, the name is scoped like an actual argument.. Note that giving an argument the same name as a formal when calling the function will only match it to that formal if matched positionally or by tag, not by name. *Standard a**rgument parsing:* Converts argument expressions into abstract syntax trees. “Standard” argument parsing and promise construction take place before the arguments are passed into the body. <Or do matching and parsing happen in reverse order?> *Standard p**romise construction*: Assigning each name (including function names) in an AST to its binding the calling environment (if ordinary) or the execution environment (if default) (Am I right that the action here for calls and for names is essentially the same, and happens at the same time using the same lookup procedure?) Note that that scoping, in the form of search, applies only when the function is called. Formals are matched, but they are never scoped, except that their default values are assigned into the function body when the function is called and then scoped from there if they are not assigned to a value before they are used. Actual arguments on function call that are not found in the calling environment are scoped ?? up the call tree ?? until they reach the top level, and then up the search path. “R Language Definition 4.3.3 Argument evaluation: One of the most important things to know about the evaluation of arguments to a function is that supplied arguments and default arguments are treated differently. The supplied arguments to a function are evaluated in the evaluation frame of the calling function. The default arguments to a function are evaluated in the evaluation frame of the function.” (Note 1: In some places closures are described as capturing their defining environment, as if they made and stored a copy. For instance, from the R Language Definition 2.1.5: “Any symbols bound in that environment are *captured* and available to the function. This combination of the code of the function and the bindings in its environment is called a ‘function closure’, a term from functional programming theory. In this document we generally use the term ‘function’, but use ‘closure’ to emphasize the importance of the attached environment.” But I think what gets passed in promises are pointers to objects in the environment, not the environment in its entirety., nor even the objects the pointers point to. These are sought only when the promise is kept, and actually copied into the execution environment only if they are subsequently altered. If they are only used as function arguments and not altered themselves they are, I believe, used in place, without copying.) (Note 2. Assigning an argument to a formal via = creates a default argument only in a function definition. When such an assignment is made during a function call, the RHS scopes to the calling environment and up, not to the function body.) *Standard body construction.* Replace each occurrence of a formal within the function with the value of that formal if a constant, and otherwise with the AST identified with that formal and any promises it contains (sometimes collectively called the actual arguments, as distinct from the formal arguments). *Standard body execution:* It is strange to me that everything I know about function evaluation, standard or non-standard, seems to be about getting arguments into or out of the body of the function. If there is anything strictly internal to body execution that can be called standard, I don't know what it is. Here are a couple of candidates, but this list seems very incomplete to me: ** Standard expression sequencing.* Expressions in the body are executed sequentially except as that sequence is altered by flow control functions (if/else, while, switch, etc.) and block grouping functions ({}). *Standard p**romise triggering:* Recognition that an action on the promise that constitutes "use" has taken place and that the promise now needs to be fulfilled. (I have never seen a clear statement of exactly what uses do and do not trigger fulfillment of a promise). *Standard promise fulfillment/Argument scoping*: For names passed via formals, scope up the environment hierarchy from the calling environment and then up the call chain ??. For default arguments, scope starting in the function’s execution environment (the parent of which is the defining environment ??), to the first environment where the name exists, and fetch the value. For functions, recurse. Inside of the body, formals are always referred to by the name of the formal, not the name of the variables assigned to the formal. When the function is called, instances of the formal will be replaced by the code for their actual arguments, but should still be referred to by the name of the formal -- attempts to refer to arguments by their actual names after substitution will not be recognized. Note: 1) for user-defined functions created in the global environment, the defining and calling environment will often be the same; 2) a promise is fulfilled once – subsequent use of the same variable will return the same value, even if it has changed in the lookup scope (as just defined) in the interim. This also prevents arguments assigned to complex expressions from being recalculated if they are used multiple times in the same call; 3) although the immediately enclosing and defining environments are set when the function is defined, the chain of environments that enclose that environment is not determined until promise fulfillment. *Standard informal scoping*: Names and functions in the execution environment not defined in formals are scoped like default arguments. There are probably more here that I am missing. Of course, there is the distinction between closures, primitives (builtin and special (are these catagories exhaustive?)) and internals (builtin and special). So maybe closures do standard body evaluation? And I am not sure where functions that use use the .C, .Fortran. .Call or .External interfaces fit in, except that I think typeof returns closure for all of them. *Standard return: * Return the value of the last expression to the calling environment, close and cleanup. I would also call return on return() or stop() standard. -- J. Andrew Hoerner Director, Sustainable Economics Program Redefining Progress (510) 507-4820 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.