On 05/02/2023 18:27, Stephane Chazelas wrote:
"wc -c" without filename arguments is meant to read stdin til
EOF and report the number of bytes it has read.

When stdin is on a regular file, GNU wc has that optimisation
whereby it skips the reading, does a pos = lseek(0,0,SEEK_CUR)
to find out its current position within the file, fstat(0) and
reports st_size - pos (assuming st_size > pos).

However, it does not move the position to the end of the file.
That means for instance that:

$ echo test > file
$ { wc -c; wc -c; } < file
5
5

Instead of 5, then 0:

$ { wc -c; cat; } < file
5
test

So the optimisation is incomplete.

It also reports the size of the file even if it could not possibly read it
because it's not open in read mode:

{ wc -c; } 0>> file
5

IMO, it should only do the optimisation if
- fcntl(F_GETFL) to check that the file is opened in O_RDONLY or O_RDWR
- current checks for /proc /sys-like filesystems
- pos > st_size
- lseek(0,st_size,SEEK_POS) is successful.

(that leaves a race window above where it could move the cursor
backward, but I would think that can be ignored as if something
else reads at the same time, there's not much we can expect
anyway).

Yes I agree.

Adjusting would also avoid the following inconsistencies:

$ { wc -c; wc -c; } < file
5
5

$ { wc -l; wc -l; } < file
1
0

$ truncate -s $(getconf PAGESIZE) file
$ { wc -c; wc -c; } < file
4096
0

Hopefully the attached addresses this.
Note it doesn't add the constraint on the input being readable,
which I'll think a bit more about.

cheers,
Pádraig
From 42f72ec424e7eecd6b56c5b6fca5f377ff73795b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <p...@draigbrady.com>
Date: Sun, 5 Feb 2023 19:52:31 +0000
Subject: [PATCH] wc: ensure we update file offset

* src/wc.c (wc): Update the offset when not reading,
and do read if we can't update the offset.
* tests/misc/wc-proc.sh: Add a test case.
* NEWS: Mention the bug fix.
Fixes https://bugs.gnu.org/61300
---
 NEWS                  |  4 ++++
 src/wc.c              |  5 ++++-
 tests/misc/wc-proc.sh | 12 ++++++++++++
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index b3cde4a01..1cea8cc32 100644
--- a/NEWS
+++ b/NEWS
@@ -57,6 +57,10 @@ GNU coreutils NEWS                                    -*- outline -*-
   sized files larger than SIZE_MAX.
   [bug introduced in coreutils-8.24]
 
+  `wc -c` will again correctly update the read offset of inputs.
+  Previously it deduced the size of inputs while leaving the offset unchanged.
+  [bug introduced in coreutils-8.27]
+
 ** Changes in behavior
 
   Programs now support the new Ronna (R), and Quetta (Q) SI prefixes,
diff --git a/src/wc.c b/src/wc.c
index 5f3ef6eee..de04612e9 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -446,7 +446,10 @@ wc (int fd, char const *file_x, struct fstatus *fstatus, off_t current_pos)
                  beyond the end of the file.  As in the example above.  */
 
               bytes = end_pos < current_pos ? 0 : end_pos - current_pos;
-              skip_read = true;
+              if (bytes && 0 <= lseek (fd, bytes, SEEK_CUR))
+                skip_read = true;
+              else
+                bytes = 0;
             }
           else
             {
diff --git a/tests/misc/wc-proc.sh b/tests/misc/wc-proc.sh
index 5eb43b982..2b5026405 100755
--- a/tests/misc/wc-proc.sh
+++ b/tests/misc/wc-proc.sh
@@ -42,6 +42,18 @@ cat <<\EOF > exp
 EOF
 compare exp out || fail=1
 
+# Ensure we update the offset even when not reading,
+# which wasn't the case from coreutils-8.27 to coreutils-9.1
+{ wc -c; wc -c; } < no_read >  out || fail=1
+{ wc -c; wc -c; } < do_read >> out || fail=1
+cat <<\EOF > exp
+2
+0
+1048576
+0
+EOF
+compare exp out || fail=1
+
 # Ensure we don't read too much when reading,
 # as was the case on 32 bit systems
 # from coreutils-8.24 to coreutils-9.1
-- 
2.26.2

Reply via email to