Date: Sat, 10 Apr 2021 08:38:58 +0200 From: tlaro...@polynum.com Message-ID: <20210410063858.ga...@polynum.com>
I am going to (try to) answer all of the current (small) set of messages on this in this one reply, not just this one msg (and yes, that will make something of a mess of threading in the mail archives.) | Wouldn't be possible to flag "var" as being a "fd_var" (or whatever) and That I do now, it is what makes "unset var" (etc) also do (effectively) close($var). But it is mostly invisible. | disallow a direct or indirect redirection to a fd that is superior or | equal to some upper limit (see below) Yes, divide the fd number space - that's what ksh does, but we cannot really do that, it would not be backward compatible. We allow scripts now to do 999>/tmp/file if they like, we cannot break that. This is the problem that caused me to delay this implementation for so long. | that is the minimum value of the range reserved for this kind | of redirection, unless it is given as "$var"? Anything given as $var is no longer present in that form by the time anything which could enforce such a rule gets a chance to do that. Variable expansion happens (relatively) early. All the redirect code sees is the number that came from the expansion. Apart from that, we cannot really invent (many) new rules, and any that are invented need to be reasonably compatible with the current definitions of this facility (that's why "new" in quotes in the email Subject - new to the NetBSD sh, but the idea has been around, in use, for decades). | Why not simply use fd the same way as memory is used, i.e. starting by both | ends, to have variable stack and heap concurrently using the mem. Because of the way the system calls work: everything allocating fds (in the kernel) allocates upwards. You might think that the shell must know what fds it has open, and can simply pick one (at whatever location) and assign that. It cannot, because it does not. Aside from the fd's the shell itself is using, the implementation keeps no track at all of what fds it has open. A user does exec 3>/dev/null and the shell opens fd 3, for writing, to /dev/null, and promptly forgets that ever happened - that fd belongs to the script, not the shell. The only state we maintain is the maximum fd we ever saw that way (and we init that at 9, we simply assume that any script might use any fd <= 9). You might think that fdflags demonstrates that the shell can find out what is open, and you're right, it can, but the way it does that is with code like (this is pseudo-code, if you want to see the real thing, look in src/bin/sh/redir.c - look for fdflagscmd ). for (fd = 0; fd < something_big; fd++) { if (fcntl(fd, F_GETFD, 0) < 0) continue; /* fd is not open */ /* fd is open, get the rest of the data and print */ } That's acceptable for fdflags, which is not executed much, it is not acceptable for every redirect that exists. Of course, the shell could keep track of all the fd's it has open, and when they get closed, etc - but that would be a whole bunch of extra code, and sh is complex enough without adding yet more global state it has to try and maintain. | starting from the end and | decreasing will be the range for {var}>/some/file. We could (sort of) do something kind of like that - what fd's are allocated in this case is entirely up to the implementation. The problem is that there is really no reasonable "the end" to use. The kernel processes I/O requests much more effectively if the file descriptor numbers are all relatively small. I used to use fd's around 900 (or something) for sh internal fds ... but changed that to MUCH smaller values (long ago). Further, it doesn't really help the underlying issue, though perhaps makes problems less likely, as: | The variable lower bound of | the latter will be the limit to test against in the above security | concern you have mentioned. While we could have a lower bound on that, effectively limiting the number of {var}> type redirects open simultaneously (it would be a big limit, no normal script would encounter it) but that doesn't solve the problem, as we cannot limit the fd numbers used in N> type redirects - it just makes issues less likely. Next message: tlaro...@polynum.com said: | 1) A var such as: exec {var}>/some/file is called a "named fd" since it | shall be referred to by the variable and not by it's implementation defined | value; That we could do, and that's not a bad name, but it isn't the name I was looking for, which is what to call this kind of redirection, rather than what to call the variable used in it (though a name for the latter might also help). tlaro...@polynum.com said: | 2) A "named fd" var once set is neither writable nor directly readable | by the script; it is read-only except it is only "readable" by the | implementation; I won't quote Mouse's reply, or your subsequent amendment of this, but no, we cannot do any of that. We need to be at least reasonably compatible with (at least sane) scripts or functions that have been written for the shells that have had this facility for a long time. It is certainly possible for a script to make the "named fd" read only, after it has been created (to avoid it being accidentally altered) but the shell cannot force that. tlaro...@polynum.com said: | 3) Redirections are only allowed such as ">$var" via the name and not | directly by the value (since "echo $var" gives "\n"), the following | construction being the only one allowed for "variables" named fds: | if some_condition; then | this_fd=var1 | else | this_fd=var2 | fi | echo "something" > eval \$$this_fd For the trivia first, the last line would need to be: echo "something" >& $(eval printf %d \$$this_fd) or something like that (%s works as well as %d). But getting that syntax right, for present purposes, isn't important. That's because, no, we cannot require that, we need to allow if some_condition; then this_fd=$var1 else this_fd=$var2 fi echo "something" >& $this_fd as that's what other shells allow, and being able to do this kind of thing is too valuable to make unduly difficult. Aside from how var1 and var2 get assigned (which isn't shown) this code, as it is, already works just fine in the NetBSD (and every other) shell. It must not break. tlaro...@polynum.com said: | 4) The current lower limit of named fds allocated can be retrieved by the | variable NAMEDFD_WATERMARK (or whatever). New named fds are only allocated | if there is room between the current upper limit of users allocated fds | and the NAMEDFD_WATERMARK. We could certainly add a variable giving the current low limit used for this, but I'm not sure that it would have any use. While it might be useful interactively, I don't really see this kind of redirection being used there {var}>/... is too much to type when you can just do 8>/... instead. This is for scripts, and even moreso, for functions - particularly functions designed to be used by many different scripts. Scripts that want to use numeric-fd redirects ("8>") cannot really (not rationally anyway) depend upon the value of some variable, the 8 there needs to be literal. | The named fds range grows downward. As above, that's not likely to happen, but purely for pragmatic reasons. But it really provides little benefit anyway, if there's to be a limit it needs to be set once, early, and never changed again, so which way we allocate the fds above the limit doesn't matter much. But, independent of allocation direction, perhaps unless you envisage NAMEDFD_WATERMARK being writeable by the script, to set the limit? That would make NAMEDFD_WATERMARK=30 essentially equivalent to what I suggested using : 30>&- (with a possible +/- 1 on the 30, depending on the exact semantics of NAMEDFD_WATERMARK, and which instance of "30" is to be adjusted). The advantage of the version using the redirect operators, is that it (partially) automatically applies to current scripts, any script using (say) 28>/tmp/file currently sets the "watermark" at 28 (or 29, depending how the watermark is interpreted) and, provided this is seen (not necessarily executed) before a {var}> type redirect is executed, simply works. To add a new variable for this purpose would require adding it to every script which uses both numeric and variable type redirections (even if one of those is buried in an included file, like a function library). If we don't need the script to be able to read the limit, the form using a variable (which would be yet another magic variable name) doesn't seem to offer any big advantage. So, thanks for the suggestions, but I don't think any of them really assist. Unless someone sees a problem with the implementation as (partly) outlined in my previous message, I am happy enough with that as currently designed. I would not be offering it if I wasn't. What I need to know is whether people think adding this to /bin/sh (most probably including SMALL shells, as this is the kind of thing useful to use in places like installation scripts) is worth the cost. (The shell does get slightly larger - though not a huge amount. I am not sure how much bigger right now, all my current binaries with this in it also include all the shell's DEBUG trace code, which makes a MUCH bigger difference to the size (and speed) of the shell. Aside from the costs associated with a slightly bigger binary, this should make no noticeable difference at all to the execution speed of the shell for scripts that don't use the facility. For those that do, the execution time goes up infinitely (from 0, as such scripts don't currently work at all, to however long the script executes). One of these redirects takes slightly longer than an ordinary one, as in addition to the redirect, the shell needs to format the fd into a decimal char string, and assign that to a variable. Uses don't vary much, that's just >&$var type things, which can be done now - but which do involve an extra variable lookup and expansion over using >&8 type things. Nothing you're likely to ever notice though. But the only way to know for sure, is for people to try it, and for that there needs to be enough interest for the code to be committed. And second, we need a name for the generic facility, so we have some way to refer to it (and also use as a sub-section heading for the man page, and a test case name for (yet to be written) ATF tests of this) kre