On 2025-01-13 Paul Eggert wrote: > On 2025-01-13 00:05, Lasse Collin wrote: > > > I pondered it before sending the patch. POSIX.1-2024 readdir() [1]: > > > > [EOVERFLOW] > > One of the values in the structure to be returned cannot be > > represented correctly. > > Yes, but the list of error numbers in the readdir ERRORS section is > not exhaustive. For example, readdir can plausibly fail with EIO even > though EIO is not in the list. See > <https://pubs.opengroup.org/onlinepubs/9799919799/functions/V2_chap02.html#tag_16_03>, > > which says "Implementations may support additional errors not > included in this list, may generate errors included in this list > under circumstances other than those described here, or may contain > extensions or limitations that prevent some errors from occurring."
Thanks! I'm aware that additional error numbers are common. However, portable programs might not know what those numbers mean on a specific operating system, thus programs need to interpret the numbers conservatively. Most of the time this works well: an error message is displayed and the operation doesn't continue. If a program wants to continue calling readdir() after an error, it has to know the errno values with which continuation works. POSIX documents EBADF and ENOENT for readdir(), and their description doesn't give an impression that further readdir() calls might succeed. Attempting to continue after EBADF or ENOENT (or EIO) could result in an infinite loop. The description of EOVERFLOW is less scary. (It would still be useful if it explicitly said that one can continue after EOVERFLOW, especially because the example code doesn't do so.) My peek at GNU ls affirmed my thought that EOVERFLOW is special in portability context. Many apps don't continue after EOVERFLOW but I guess even fewer do so after EILSEQ. There is also: Implementations shall not generate a different error number from one required by this volume of POSIX.1-2024 for an error condition described in this volume of POSIX.1-2024, but may generate additional errors unless explicitly disallowed for a particular function. Pedantic interpretation might be that EILSEQ isn't allowed in this situation because EOVERFLOW already describes the error condition. In practice EILSEQ is fine if the diagnostics are delayed until the end of the directory. Applications that know the implementation-specific detail can then even count the number of conversion errors. > EILSEQ is more-appropriate than EOVERFLOW here, so I'd use it. [...] > PS. As you mention, it's fine (and indeed a good idea) to delay the > EILSEQ error to the end, as too much code mistakenly treats any null > return from readdir as EOF. Thanks! I agree now. That is, behave like v2-counting.patch but with EILSEQ instead of EOVERFLOW. > > If readdir() returns the more logical sounding EILSEQ, it means that > > GNU ls won't attempt to list the remaining directory entries. > > Oh, that's OK, we'd just change GNU ls. This would make for a better > 'ls' for the user, as the diagnostics would be more informative. That helps 'ls'. Other apps on Windows may trip on EILSEQ (and probably on EOVERFLOW too) if accessing a directory with even one problematic name. In some cases it might be a better user experience to list a few non-existing lookalike names in GUI. Perhaps this was the compatibility concern in the other branch of this thread. The behavior needs to be configurable. > > In GNU coreutils, src/ls.c, print_dir() has a loop that calls > > readdir().[2] It handles two errno values specially: > > > > - ENOENT is treated the same as successfully reaching the end of > > the directory. > > > > - EOVERFLOW results in an error message but directory reading is > > continued still. > > The EOVERFLOW treatment is buggy because errno might have changed > since readdir was called Those are so easy to miss. :-/ I had fixed a similar error only a few days ago but I missed it in ls.c even though I was specifically looking at its errno handling. -- Lasse Collin