On 05/02/2023 18:27, Stephane Chazelas wrote:
"wc -c" without filename arguments is meant to read stdin til EOF and report the number of bytes it has read.When stdin is on a regular file, GNU wc has that optimisation whereby it skips the reading, does a pos = lseek(0,0,SEEK_CUR) to find out its current position within the file, fstat(0) and reports st_size - pos (assuming st_size > pos). However, it does not move the position to the end of the file. That means for instance that: $ echo test > file $ { wc -c; wc -c; } < file 5 5 Instead of 5, then 0: $ { wc -c; cat; } < file 5 test So the optimisation is incomplete. It also reports the size of the file even if it could not possibly read it because it's not open in read mode: { wc -c; } 0>> file 5 IMO, it should only do the optimisation if - fcntl(F_GETFL) to check that the file is opened in O_RDONLY or O_RDWR - current checks for /proc /sys-like filesystems - pos > st_size - lseek(0,st_size,SEEK_POS) is successful. (that leaves a race window above where it could move the cursor backward, but I would think that can be ignored as if something else reads at the same time, there's not much we can expect anyway).
Yes I agree. Adjusting would also avoid the following inconsistencies: $ { wc -c; wc -c; } < file 5 5 $ { wc -l; wc -l; } < file 1 0 $ truncate -s $(getconf PAGESIZE) file $ { wc -c; wc -c; } < file 4096 0 Hopefully the attached addresses this. Note it doesn't add the constraint on the input being readable, which I'll think a bit more about. cheers, Pádraig
From 42f72ec424e7eecd6b56c5b6fca5f377ff73795b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A1draig=20Brady?= <p...@draigbrady.com> Date: Sun, 5 Feb 2023 19:52:31 +0000 Subject: [PATCH] wc: ensure we update file offset * src/wc.c (wc): Update the offset when not reading, and do read if we can't update the offset. * tests/misc/wc-proc.sh: Add a test case. * NEWS: Mention the bug fix. Fixes https://bugs.gnu.org/61300 --- NEWS | 4 ++++ src/wc.c | 5 ++++- tests/misc/wc-proc.sh | 12 ++++++++++++ 3 files changed, 20 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index b3cde4a01..1cea8cc32 100644 --- a/NEWS +++ b/NEWS @@ -57,6 +57,10 @@ GNU coreutils NEWS -*- outline -*- sized files larger than SIZE_MAX. [bug introduced in coreutils-8.24] + `wc -c` will again correctly update the read offset of inputs. + Previously it deduced the size of inputs while leaving the offset unchanged. + [bug introduced in coreutils-8.27] + ** Changes in behavior Programs now support the new Ronna (R), and Quetta (Q) SI prefixes, diff --git a/src/wc.c b/src/wc.c index 5f3ef6eee..de04612e9 100644 --- a/src/wc.c +++ b/src/wc.c @@ -446,7 +446,10 @@ wc (int fd, char const *file_x, struct fstatus *fstatus, off_t current_pos) beyond the end of the file. As in the example above. */ bytes = end_pos < current_pos ? 0 : end_pos - current_pos; - skip_read = true; + if (bytes && 0 <= lseek (fd, bytes, SEEK_CUR)) + skip_read = true; + else + bytes = 0; } else { diff --git a/tests/misc/wc-proc.sh b/tests/misc/wc-proc.sh index 5eb43b982..2b5026405 100755 --- a/tests/misc/wc-proc.sh +++ b/tests/misc/wc-proc.sh @@ -42,6 +42,18 @@ cat <<\EOF > exp EOF compare exp out || fail=1 +# Ensure we update the offset even when not reading, +# which wasn't the case from coreutils-8.27 to coreutils-9.1 +{ wc -c; wc -c; } < no_read > out || fail=1 +{ wc -c; wc -c; } < do_read >> out || fail=1 +cat <<\EOF > exp +2 +0 +1048576 +0 +EOF +compare exp out || fail=1 + # Ensure we don't read too much when reading, # as was the case on 32 bit systems # from coreutils-8.24 to coreutils-9.1 -- 2.26.2