sh (needs a name)

Robert Elz Sat, 10 Apr 2021 11:38:23 -0700

    Date:        Sat, 10 Apr 2021 08:38:58 +0200
    From:        tlaro...@polynum.com
    Message-ID:  <20210410063858.ga...@polynum.com>


I am going to (try to) answer all of the current (small) set of
messages on this in this one reply, not just this one msg (and yes, that
will make something of a mess of threading in the mail archives.)

  | Wouldn't be possible to flag "var" as being a "fd_var" (or whatever) and

That I do now, it is what makes "unset var" (etc) also do (effectively)
close($var).   But it is mostly invisible.

  | disallow a direct or indirect redirection to a fd that is superior or
  | equal to some upper limit (see below)

Yes, divide the fd number space - that's what ksh does, but we cannot
really do that, it would not be backward compatible.  We allow scripts
now to do 999>/tmp/file if they like, we cannot break that.

This is the problem that caused me to delay this implementation for
so long.

  | that is the minimum value of the range reserved for this kind
  | of redirection, unless it is given as "$var"?

Anything given as $var is no longer present in that form by the time
anything which could enforce such a rule gets a chance to do that.
Variable expansion happens (relatively) early.  All the redirect code
sees is the number that came from the expansion.

Apart from that, we cannot really invent (many) new rules, and any that
are invented need to be reasonably compatible with the current definitions
of this facility (that's why "new" in quotes in the email Subject - new to the
NetBSD sh, but the idea has been around, in use, for decades).

  | Why not simply use fd the same way as memory is used, i.e. starting by both
  | ends, to have variable stack and heap concurrently using the mem. 

Because of the way the system calls work: everything allocating fds (in the
kernel) allocates upwards.

You might think that the shell must know what fds it has open, and can
simply pick one (at whatever location) and assign that.   It cannot, because
it does not.  Aside from the fd's the shell itself is using, the implementation
keeps no track at all of what fds it has open.

A user does
                exec 3>/dev/null
and the shell opens fd 3, for writing, to /dev/null, and promptly
forgets that ever happened - that fd belongs to the script, not the
shell.   The only state we maintain is the maximum fd we ever saw
that way (and we init that at 9, we simply assume that any
script might use any fd <= 9).

You might think that fdflags demonstrates that the shell can find out
what is open, and you're right, it can, but the way it does that is
with code like (this is pseudo-code, if you want to see the real thing,
look in src/bin/sh/redir.c - look for fdflagscmd ).

        for (fd = 0; fd < something_big; fd++) {
                if (fcntl(fd, F_GETFD, 0) < 0)
                        continue;               /* fd is not open */
                /* fd is open, get the rest of the data and print */
        }

That's acceptable for fdflags, which is not executed much, it is not
acceptable for every redirect that exists.

Of course, the shell could keep track of all the fd's it has open, and
when they get closed, etc - but that would be a whole bunch of extra
code, and sh is complex enough without adding yet more global state
it has to try and maintain.

  | starting from the end and
  | decreasing will be the range for {var}>/some/file.

We could (sort of) do something kind of like that - what fd's are
allocated in this case is entirely up to the implementation.  The
problem is that there is really no reasonable "the end" to use.
The kernel processes I/O requests much more effectively if the
file descriptor numbers are all relatively small.   I used to use
fd's around 900 (or something) for sh internal fds ... but changed
that to MUCH smaller values (long ago).

Further, it doesn't really help the underlying issue, though perhaps makes
problems less likely, as:

  | The variable lower bound of
  | the latter will be the limit to test against in the above security
  | concern you have mentioned.

While we could have a lower bound on that, effectively limiting the
number of {var}> type redirects open simultaneously (it would be a
big limit, no normal script would encounter it) but that doesn't
solve the problem, as we cannot limit the fd numbers used in N>
type redirects - it just makes issues less likely.

Next message:

tlaro...@polynum.com said:
  | 1) A var such as: exec {var}>/some/file is called  a "named fd" since it
  | shall be referred to by the variable and not by it's implementation defined
  | value;

That we could do, and that's not a bad name, but it isn't the name I was
looking for, which is what to call this kind of redirection, rather than
what to call the variable used in it (though a name for the latter might
also help).

tlaro...@polynum.com said:
  | 2) A "named fd" var once set is neither writable nor directly readable
  | by the script; it is read-only except it is only "readable" by the
  | implementation; 

I won't quote Mouse's reply, or your subsequent amendment of this, but no,
we cannot do any of that.   We need to be at least reasonably compatible
with (at least sane) scripts or functions that have been written for the
shells that have had this facility for a long time.

It is certainly possible for a script to make the "named fd" read only,
after it has been created (to avoid it being accidentally altered) but
the shell cannot force that.


tlaro...@polynum.com said:
  | 3) Redirections are only allowed such as ">$var" via the name and not
  | directly by the value (since "echo $var" gives "\n"), the following
  | construction being the only one allowed for "variables" named fds:

  | if some_condition; then
  |     this_fd=var1
  | else
  |     this_fd=var2
  | fi

  | echo "something" > eval \$$this_fd 

For the trivia first, the last line would need to be:

        echo "something" >& $(eval printf %d \$$this_fd)

or something like that (%s works as well as %d).   But getting that
syntax right, for present purposes, isn't important.

That's because, no, we cannot require that, we need to allow

        if some_condition; then
                this_fd=$var1
        else
                this_fd=$var2
        fi

        echo "something" >& $this_fd

as that's what other shells allow, and being able to do this kind of
thing is too valuable to make unduly difficult.   Aside from how var1
and var2 get assigned (which isn't shown) this code, as it is,
already works just fine in the NetBSD (and every other) shell.
It must not break.


tlaro...@polynum.com said:
  | 4) The current lower limit of named fds allocated can be retrieved by the
  | variable NAMEDFD_WATERMARK (or whatever). New named fds are only allocated
  | if there is room between the current upper limit of users allocated fds
  | and the NAMEDFD_WATERMARK.

We could certainly add a variable giving the current low limit used for
this, but I'm not sure that it would have any use.   While it might be
useful interactively, I don't really see this kind of redirection being
used there {var}>/... is too much to type when you can just do 8>/...
instead.   This is for scripts, and even moreso, for functions - particularly
functions designed to be used by many different scripts.  Scripts that
want to use numeric-fd redirects ("8>") cannot really (not rationally anyway)
depend upon the value of some variable, the 8 there needs to be literal.

  | The named fds range grows downward.

As above, that's not likely to happen, but purely for pragmatic reasons.
But it really provides little benefit anyway, if there's to be a limit
it needs to be set once, early, and never changed again, so which way
we allocate the fds above the limit doesn't matter much.

But, independent of allocation direction, perhaps unless you envisage
NAMEDFD_WATERMARK being writeable by the script, to set the limit?
That would make
        NAMEDFD_WATERMARK=30
essentially equivalent to what I suggested using
        : 30>&-
(with a possible +/- 1 on the 30, depending on the exact semantics of
NAMEDFD_WATERMARK, and which instance of "30" is to be adjusted).

The advantage of the version using the redirect operators, is that it
(partially) automatically applies to current scripts, any script using
(say)
        28>/tmp/file
currently sets the "watermark" at 28 (or 29, depending how the watermark is
interpreted) and, provided this is seen (not necessarily executed) before a
{var}> type redirect is executed, simply works.   To add a new variable for
this purpose would require adding it to every script which uses both numeric
and variable type redirections (even if one of those is buried in an included
file, like a function library).

If we don't need the script to be able to read the limit, the form using
a variable (which would be yet another magic variable name) doesn't seem
to offer any big advantage.

So, thanks for the suggestions, but I don't think any of them really
assist.   Unless someone sees a problem with the implementation as
(partly) outlined in my previous message, I am happy enough with that
as currently designed.   I would not be offering it if I wasn't.

What I need to know is whether people think adding this to /bin/sh
(most probably including SMALL shells, as this is the kind of thing
useful to use in places like installation scripts) is worth the cost.
(The shell does get slightly larger - though not a huge amount.  I am
not sure how much bigger right now, all my current binaries with this
in it also include all the shell's DEBUG trace code, which makes a MUCH
bigger difference to the size (and speed) of the shell.   Aside from the
costs associated with a slightly bigger binary, this should make no
noticeable difference at all to the execution speed of the shell for
scripts that don't use the facility.  For those that do, the execution
time goes up infinitely (from 0, as such scripts don't currently work
at all, to however long the script executes).  One of these redirects
takes slightly longer than an ordinary one, as in addition to the
redirect, the shell needs to format the fd into a decimal char string,
and assign that to a variable.   Uses don't vary much, that's just
>&$var type things, which can be done now - but which do involve an
extra variable lookup and expansion over using >&8 type things.
Nothing you're likely to ever notice though.   But the only way to
know for sure, is for people to try it, and for that there needs to
be enough interest for the code to be committed.

And second, we need a name for the generic facility, so we have some way to
refer to it (and also use as a sub-section heading for the man page,
and a test case name for (yet to be written) ATF tests of this)

kre

Re: Possible "new" redirect style for /bin/sh (needs a name)

Reply via email to