Date: Sat, 13 Dec 2025 01:51:15 -0500
From: Grisha Levit <[email protected]>
Message-ID:
<CAMu=brqbjguewyw8marjcnhvwkte_r3_vpamdxweau9queb...@mail.gmail.com>
| What gawk had was essentially a call like:
| fcntl(2, F_SETFL, fcntl(2, F_GETFL) | O_APPEND);
OK. That will (and certainly should) behave as you describe.
But I can't see how that would result in the behaviour described,
unless the cat bug you mentioned is truly bizarre. Where output
to stderr should go should have no impact on how cat decides whether
its stdin and stdout are the same file (to avoid causing infininte
file expansion, I presume).
| ...and, it seems, any changes to the file offset or file flags associated
| with inherited file descriptors.
There are 3 levels of access to files (in the kernel) - the inode, or the
actual file description, which contains most of the information that stat()
returns, and all of the data - that is shared amongst everything which
references the file (kind of obviously).
The file table, which is created by open() calls (and similar), and
contains a reference to the inode - multiple file table entries can
refer to the same inode (file). This contains the open mode, and
most of the file access flags (including O_APPEND) and the file position.
One file table entry is shared between users that inherit the results
of a single open call() - whether by dup(), fork(), or something
similar, which allocates more accesses to the same open file.
Each different open() call gets a distinct file table entry, those
share only the data, access permissions (what chmod affects) mod times,
etc, and the data in the file. The open modes, and flags, and
file offsets are all distinct.
And finally, the file descriptor table, there is (nominally anyway)
one of these for each process, originally it was never shared with
anything, though linux's clone() can allow it to be shared. I suspect
(assume) that's primarily designed to allow clone() to create what
are effectively threads - share the address space, share the file table,
... and it all looks to be the same process, but for most things in the
kernel the threads look like they're different processes - scheduled
separately, can be running at the same time on differnet cpus, etc.
The file descr table contains the mapping between file desciptor numbers
(0, 1, 2, ...) and the file table entry for the open which created them,
and the close on exec, and close on fork, flags (only those, currently).
(Then libc maps FILE objects like stdin, stdout, stderr, to file descriptor
numbers, for current purposes, those can be considered to be the
same thing.)
So, after a fork() the file descriptor table is copied from parent to child
(with any close on fork files closed in the child, though I don't think
linux supports close on fork yet - it is in last year's POSIX standard).
As set up originally (ignoring close on fork) it will start out identical
to the parent's - but can then be changed as desired ... eg: to implement
a command sibstitution a shell would typically fork(), and then change
stdout (fd 1) of the child process to be a pipe (previously created before
the fork, and available in both) back to the parent - to write the results
so the shell can interpolate them in the command line to replace the cmdsub)
and close the pipe pair which could be used to read from the pipe (which the
child does not need). The parent leaves the read side of the pipe open
(and reads from it) and closes the write side. By this time the file
descriptor tables of the two processes are different, but nothing has changed
the file table entries (shared between the two) for the files (pipes, etc)
which they both still have open. Nothing either process can do will affect
the file desciptor table of the other - but either can alter the other file
flags, lseek(), etc - which affect the shared file table. That only matters
to accesses which originated in the same open() call (or pipe(), socket(),
and similar).
| So executing gawk, even in a subshell, would add O_APPEND to the
| file status flags of FD 2 of the parent.
Yes. That's not normally likely to matter much, typically everyone wants to
write to the end of stderr anyway.
| Compare normal behavior:
|
| $ echo abc >foo; (echo def) 1<>foo 2>&1
| $ cat foo
| def
|
| and same steps with a dummy gawk invocation added:
|
| $ echo abc >foo; ((gawk ''); echo def) 1<>foo 2>&1
| $ cat foo
| abc
| def
Yes, if fd 1 and fd 2 are dups, which 2>&1 does, then they share
the same file table entry, and the O_APPEND which goes with it.
That's certainly something which gawk probably should not be doing,
and can be called a bug - but I still can't see how that would cause
the symptoms described in the initial report, which, as best I recall,
did not do anything quite like the above - no duping stderr and
stdout for example.
| All other shells I tried on Linux and macOS behave the same: an F_SETFL
| in a subshell affects the flags of the FD in the parent.
That certainly suggests it is nothing any particular shell is doing, but
must be in the kernel (or just possibly, libc, though I'm not sure what
it could do to cause the symptom described). Ie: that absolves bash.
But that is just for the issue you're describing, which I can't see how
is related to the original problem report.
| The removed gawk code was prefaced with the following explanation, which
| I don't really understand, but it does suggest some difference in behavior
| between systems in this area. Which behavior, if any, can be considered a
| bug I do not know.
|
| // 1/2018: This is needed on modern BSD systems so that the
| // inplace tests pass. I think it's a bug in those kernels
| // but let's just work around it anyway.
That's not enough for me to understand either, nor for me to have any idea
what problem they think exists on "modern BSD systems" (one of which is what
I use) which needs something like that. I'd suspect it is more likely that
the test in question is assuming some non standard behaviour which happens
to work on linux - but without knowing what the problem is (which wouldn't be
an issue for bug-bash), that's just a guess.
kre