On 2025-01-12 Paul Eggert wrote:
> On 2025-01-12 12:26, Lasse Collin wrote:
> > The patch makes readdir() detect that lossless conversion isn't
> > possible and inform the application with EOVERFLOW  
> 
> Wouldn't EILSEQ be more appropriate? That's what "open" is supposed
> to do with names that the file system doesn't support.

I pondered it before sending the patch. POSIX.1-2024 readdir() [1]:

    [EOVERFLOW]
        One of the values in the structure to be returned cannot be
        represented correctly.

Since the UTF-16 encoded string cannot be correctly converted to a more
limited character set, clearly it's about a value that "cannot be
represented correctly". Error message from strerror(EOVERFLOW) is
confusing in this case though. strerror(EILSEQ) feels more logical
but it's not explicitly listed for readdir() in POSIX.

> Also, the POSIX spec suggests that readdir should return a null
> pointer right away with errno set, rather than wait for the end of
> the directory. A subsequent readdir resumes traversal of the
> directory, even after such an error. Doing it this nicer way should
> avoid the need for the new label and goto, and it would also let the
> caller count how many bad entries the directory has.

Returning an error immediately makes the code slightly simpler. I
wonder how many apps continue after any error though. The example in
the POSIX readdir() page terminates the search after any error but it's
a simplified example.

In GNU coreutils, src/ls.c, print_dir() has a loop that calls
readdir().[2] It handles two errno values specially:

  - ENOENT is treated the same as successfully reaching the end of
    the directory.

  - EOVERFLOW results in an error message but directory reading is
    continued still.

All other errors make print_dir() stop reading the directory. The
behavior of ls seems reasonable in context of what errno values are
listed for readdir() in POSIX.

If readdir() returns the more logical sounding EILSEQ, it means that
GNU ls won't attempt to list the remaining directory entries. Thus, I
think using EOVERFLOW is better than EILSEQ when character set
conversion cannot be done correctly.

I will change the code to return EOVERFLOW immediately instead of
delaying it.

[1] 
https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/functions/readdir_r.html

[2] https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/ls.c#n3086

-- 
Lasse Collin

Reply via email to