Date: Fri, 20 Sep 2024 18:53:11 +0200 From: <tlaro...@kergis.com> Message-ID: <zu2od2phtsn4h...@kergis.com>
| For some reason[*], I looked at sh(1) "Command Search and Execution" | in POSIX (issue 7 2018 and then issue 8 2024). Over the past many years this has been one of the most debated parts of the specification. It is constantly being reworded. | From the specification above, I'm puzzled about two things: regular | built-ins and PATH search: Yes, that aspect in particular, there is an attitude amongst some of the people who work on the standard that users must be able to replace regular built in utilities by their own replacements, simply by placing their own in a directory that is in PATH before the directory where the standard version of the utility exists. Almost all shell developers consider this to be nonsense, and refuse to have anything to do with it (some version of ksh93 is reputed to have rules something like that implemented though.) In issue 7 and before (not sure now for how long before, but that no longer matters) all regular built-in utilities were required to have a file system implementation (so that, for example, xargs could run it, without a shell being involved) - even those which make absolutely no sense outside the shell, like "wait" and "fg" (some others which are mostly useless outside the shell, like "cd" could at least be argued to be able to attempt the operation, and issue an error on failure, even if the effects would be lost). Some systems install such things by making links to a script like #! /bin/sh ${0##*/} "$@" with the names of the relevant built-in utilities, solely to meet that requirement. NetBSD always refused to indulge in such stupidity. | In issue 7, built-ins are segregated in two groups: "special | built-ins" and "regular built-ins", the latter being the complement of | the former (a built-in that is not "special"). That's always been done - there are other differences in how they're required to operate than in this area - such as what happens when one fails, and the effects of variable assignments as part of the same command. The special built-ins are mostly things that most people almost consider to be syntax (like "break" "continue" "return" "." ...) | But in the spec, a regular built-in can only be invoked in e), Not quite, the utilities listed in (d) are all regular built-in utilities, and those simply get executed. This is the (useful) big change in Issue 8 - that list are now knows as "intrinsic" utilities, which have two properties of note - first, those ones aren't required to exist in the filesystem any more, and second, they're exempt from the path search nonsense. Fortunately, implementations are also allowed to designate any other built-in utility as being intrinsic (though it is recommended that they don't). In our shell, every built-in is intrinsic. (I believe bash is the same). | that is the corresponding name file has to be accessible via the PATH. | If it is not, one can not invoke a regular built-in? That is the intent, yes. | This may have sense for an utility required by POSIX No, it makes sense for nothing. | but there may be a regular built-in that POSIX doesn't speak about... That one is actually not a problem - both because such a utility could also be implemented as a file system command, and so meet the requirement, but more because as soon as an application attempts to invoke any non standard utility, all bets are off, that's outside what the standard specifies, and so the standard specifies nothing about what should happen. And yes, that means that if you write your own command (or add one from pkgsrc, that is not a standard utility) then the standard doesn't require that things like redirection (or anything else really) will work. Of course, no real implementation would ever break things that way, what is a standard utility, and what is not, is not distinguished anywhere (except that to conform with the standard, all the standard utilities, except the ones that are part of options that are not included, like for example uucp and its friends, must be implemented, and available in some defined PATH setting - which isn't necessarily the one that any normal user ever uses.) | And what does "a successful search" mean? From the referenced | paragraph "XBD Environment Variables": | | ---8<---issue 7 2018 | The list shall be searched from beginning to end, applying the | filename to each prefix, until an executable file with the specified | name and appropriate execution permissions is found. | --->8---issue 7 2018 | | But this contradicts the use of the shell in the paragraph I'm talking | about, since if the permissions can be stat'ed, the "executable" nature | of the file can not be ascertained without exec'ing I think that's just a wording bug, and should be fixed (and would be if someone pointed it out) - all they really mean is a file with 'x' permission in PATH. However, you're right, the term "executable file" is defined to mean something that "exec*(2)" can execute, and that isn't what they really mean there - no-one expects an attempt be made to actually execute the file located, just that the shell would try that if there was no built-in to execute instead. | ---8<---issue 7 2018 | The term "built-in" implies that the shell can execute the utility | directly and does not need to search for it. | --->8---issue 7 2018 | | The proposition is for all built-ins. And this contradicts the | paragraph where the built-in has to be searched for previously... No, it doesn't - the version searched for (and found) (if you believe anything should actually operate like that) isn't the built-in, that's the file system equivalent (like we have /bin/echo and the built-in echo in sh(1), which are actually entirely different commands). The intent is that the shell locates the file system version of the executable, then, if there is a built-in with the same name, and that built-in claims to be the equivalent of the version in the directory in which the shell found the file system version, then the built-in is executed instead of the file system version. | "The special built-in utilities in this section need not be provided | in a manner accessible via the exec family of functions defined in | the System Interfaces volume of POSIX.1-2017." Yes, not even the most insane of the posix committee ever believed that "break" or "return" would be useful in any way as a file system command. | i.e. not special built-ins have to be provided in a manner accessible | via the exec family of functions.) Yes, but (as above) only the standard ones - anything non standard (anywhere, including an option to a standard utility that isn't defined for that utility) places things outside the standard, and none of the rules apply. | What does: | | "the built-in or function is associated with the | directory that was most recently tested during the successful | PATH search" | | mean? How is a directory "associated" to a built-in or a function, No-one actually knows, that isn't specified anywhere, it is up to the implementation to make that work, but I believe that the intent is that each (non-intrinsic) regular built-in is associated with a path somewhere or other (compiled into the shell, in a file that the shell reads at startup, perhaps via a sysctl like interface - whatever the implementation prefers). That is, for us we have "echo" "test" and "printf" (and more) built in, so something somewhere would have echo /bin test /bin printf /usr/bin (and many more) defined - then if the user types "echo hello" the system searches PATH, finds "echo" in some directory, then checks this list - if the directory found by the search matches the one in the list, then the built-in gets executed. If the directory is different, then the command from the file system gets executed, and as you surmised, if the command isn't found by the search, then a "command not found" error results (even though the built-in is there.) So if you had PATH=~/bin:/usr/pkg/bin:/bin and you had a "test" in ~/bin, "echo" in /usr/pkg/bin and no printf in any of those three directories, then the built-in versions would never be executed. | Note: in the NetBSD implementation---I didn't look in the CSRG | archives to see if these are in fact here from long ago---there are | prefixes in the path: "%builtins" and "%func"; | perhaps are these an attempt to this association? They are from long long long ago, yes. %func is something entirely different, and unrelated, and not entirely useless. "%builtins" was an attempt to comply with what the language in some much older version of the standard (when all this was much less precisely specified than it is now). That's a joke, and most versions of ash (the parent of our shell, FreeBSD's dash, perhaps others) have long deleted it. We haven't, but probably should, it is undocumented, and no-one uses it. | These are builtins or funcs if the prefix is specified as the | preceding "dir" in PATH? I don't really want to document %builtins, so everyone forget you ever read this, but the idea is that if that is specified as a suffix of an entry in PATH, and the PATH search reaches that entry, then a built-in command will be found and executed (if there is no %builtins entry in PATH, then one is assumed right at the start, which means built-ins are always executed if named ... that's what almost everyone simply assumes will happen). By explicitly sticking %builtins elsewhere, it is possible for a user to override a builtin with a file-system command located earlier in the PATH. That's a dumb way to do it though, much better is simply to supply a function like: echo() { /path/to/the/echo/I/like "$@" } instead - and the usefulness of that is one of the reasons that the NetBSD shell always reads the $ENV file (even in non-interactive shells). This way you can selectively override built-ins (except the special ones that you really don't want to override) with whatever versions you prefer. The %func thing is entirely different - if a search for a command reaches that directory (the one with %func as a suffix - just in PATH, not in the directory name) without having yet located the command (or we would not have gotten that far) and there is a file in the directory with the same name as the command being sought (I think this one needs 'r' permission, and not necessarily 'x', but I haven't checked, so might be wrong - 'r' is needed for sure though) then the shell will read that file, as if with the '.' command. If after that has happened, there is now a function with the name of the command to be executed defined (clearly there wasn't before, or the function would have already been executed, without any PATH search) then the shell will execute the function (and search PATH no more). The newly defined function (and anything else that running the script that was found happens to accomplish - normally just defining other functions as well) remains in the shell to be used again later if needed, with no PATH search involved. The idea is that you make a file containing functions you sometimes use, place that file in some directory, say ~/myfuncs and link it to the name of every function it defines (the directory can have other groups of unrelated functions) and then you put ~/myfuncs%func as an entry in PATH (usually it would go fairly early in PATH - but that depends upon what you're attempting to achieve - perhaps last if the intent is to provide fallback versions of commands in case the system that you're using happens not to have them installed) - then when you happen to need to use one of those functions (you're doing something which needs one) then that function gets defined "by magic" along with any other related functions you're likely to use if you're using any of them. On the other hand, if you never need these functions in a shell, then ten never get loaded, and so save a little memory in the shell, and a tiny bit of command search time. | Could somebody explain this in an "international" english, that is | something a not english native speaker with an average english | vocabulary could parse? I don't know, does the above count? | [*]: The reason why I looked at the spec is that, under Plan9, there is | a feature that I find quite neat and consistent: utilities can be | organized in subfolders and one can invoke from the shell (rc(1)) an | utility like this: "ip/ipconfig ...". This organized the utilities in | groups, instead of putting everything flat in a directory. Yes, some people like that, and there are one or two shells which allow it I think (not typical POSIX type shells) - you can accomplish that, more or less, by just adding all those directories to PATH, and then you get to avoid typing the "ip/" part of the name. | I thus wanted to see how I could add this (it is not POSIX compliant) No, it isn't, POSIX requires that any command with a '/' in its name be simply executed (from the filesystem) using the name given, without any other processing (of the command name). | by setting an option, without disturbing much the POSIX behavior | or introducing security problems that the POSIX spec had tried to | address... It isn't really a security issue I think - just isn't the way that shells have ever worked (way back to the Thompson shell) - either there's a path search (in that shell the directories to examine, and their order, was built into the shell, no way for users to alter it) for simple one-segment names (no '/') and others are simply exec'd. That's very hard to change now (in general, an option could allow it though) as it is so ingrained in how people work. kre