On Tuesday 25 February 2025 20:11:12 Lasse Collin wrote:
> While not about dirent, an old issue in stat() and _stat() is kind of
> related:
> 
> MSVCRT's _stat() fails with "directory/", but in POSIX a trailing slash
> should work for directories (and only for directories).
> mingw-w64-crt/stdio/_stat.c provides stat() that, with some exceptions,
> removes *one* trailing / or \ and then calls _stat(). This hack is often
> enough, but as a side effect it then makes "file/" succeed while it
> shouldn't. And "directory//" still fails while it shouldn't.
> 
> On UCRT, stat() simply calls _stat(). UCRT's _stat() has eerily similar
> behavior with trailing slashes as mingw-w64's replacement stat() has in
> MSVCRT builds. It's as if UCRT was made bug-for-bug compatible with
> mingw-w64 instead of thinking what is the correct but more complex
> thing to do.
> 
> MSVCRT's _stat() doesn't work on \\.\ or \\?\ paths. UCRT's does, but
> the buggy trailing slash removal means that only the last line below
> succeeds with UCRT's _stat():
> 
>     \\?\GLOBALROOT\Device\Harddisk0\Partition3
>     \\?\GLOBALROOT\Device\Harddisk0\Partition3\
>     \\?\GLOBALROOT\Device\Harddisk0\Partition3\\
> 
> MSVCRT's _stat() doesn't follow reparse points. UCRT's _stat() does
> (on symlink or junction loops it fails with ENOENT, not ELOOP).

MS _stat() function can do whatever it wants. It is not part of C or
POSIX. But mingw-w64's stat() function should be consistent and returns
correct information. What you have figured out is another mess. It is
not nice.

So as some future followup step, it would be nice to fix the mingw-w64
stat() function. We can put it into infinite TODO list :-)

> On 2025-02-23 Lasse Collin wrote:
> > On 2025-02-23 Pali Rohár wrote:
> > > On 2025-02-23 Lasse Collin wrote:
> > > > The fact that something else does it wrong doesn't mean that it's
> > > > necessarily fine to do so. Mixing these path types would be very
> > > > bad in stat() or such function. To me it doesn't feel too bad when
> > > > one is specifically trying to open a directory to list its
> > > > contents (like opendir or Python's os.listdir).
> > > > 
> > > > Having said that, I will try the GetFileAttributesW solution
> > > > still. Maybe it has some other minor benefits in error detection
> > > > (perhaps it allows shortening the checked error list of
> > > > FindFirstFileW).    
> > > 
> > > This could improve error handling.  
> > 
> > It should catch currently unhandled error codes as ENOENT. Then the
> > error cases of FindFirstFileW don't need to handle quite so many
> > errors like those from \\.\nul (ERROR_INVALID_FUNCTION) and \\.\con
> > (ERROR_NOT_FOUND). I suppose four error codes resulting in ENOENT can
> > be omitted too since those shouldn't occur if GetFileAttributesW
> > succeeds (unless the target changes due to a race condition).
> > 
> > GetFileAttributesW docs has the following in "Remarks":
> > 
> >     If you call GetFileAttributes for a network share, the function
> >     fails, and GetLastError returns ERROR_BAD_NETPATH. You must
> >     specify a path to a subfolder on that share.
> > 
> > But it works still. Maybe it doesn't in some old Windows version?
> > 
> > Calling GetFileAttributesW early in _wopendir (before
> > GetFullPathNameW) made simple recursive directory scanning 20 %
> > slower. The program only goes through the tree counting the entries,
> > not even printing them. In real world apps the effect would be
> > smaller. It's still dramatically faster than with the old dirent when
> > taking advatange of d_type.
> 
> While \\?\...\Partition3 without trailing \ isn't a directory, I more
> and more think that it doesn't matter too much if opendir works on it
> still. It's so tiny bug that *it alone* isn't worth the the extra code
> or the runtime overhead of GetFileAttributes. (With stat(), this kind of
> cheating wouldn't be acceptable.)

I'm really not sure. What I think is that readdir() and stat() should be
in-sync and consistent. And ideally follows the Windows meaning.

> Other considerations about using or not using GetFileAttributes:
> 
> (1) If opendir is called with a weird device path, can something bad
>     happen if FindFirstFileW is called on that path with \* at the end?
>     If yes, can GetFileAttributes call without the \* at the end have
>     similar bad effects? If yes, then worrying about \* seems pointless.
>     I hope this is far in the over-thinking territory.
> 
> (2) Error detection differences:
> 
>     - To map "nul" and "con" to ENOTDIR, current FindFirstFileW-only
>       method handles ERROR_INVALID_FUNCTION and ERROR_NOT_FOUND. With
>       GetFileAttributes these two wouldn't be needed.
> 
>     - "symlink-loop/foo" can only be detected with FindFirstFileW:
>           FindFirstFileW     =>  ERROR_CANT_RESOLVE_FILENAME
>           GetFileAttributes  =>  ERROR_FILE_NOT_FOUND
> 
>     - Since GetFileAttributes doesn't follow a reparse point when it's
>       the last pathname component, "foo/symlink-loop" needs to be
>       handled in FindFirstFileW error codes still. So does EACCES for
>       the last pathname component.
> 
>     - The old dirent fails with ENOENT whenever GetFileAttributes
>       fails. More precise errno values would require calling
>       GetLastError and deciding what errno value to use for unknown
>       errors. This is not different from the FindFirstFileW-only
>       method. On the other hand, if ENOENT is OK as the default error,
>       then the FindFirstFileW method could be changed to that.
> 
> I think (2) doesn't favor the use of GetFileAttributes. I really hope
> (1) doesn't either.
> 
> > > > > > An alternative would be to call GetLastError() ==
> > > > > > ERROR_INVALID_FLAGS if WideCharToMultiByte fails and retry
> > > > > > with flags = 0.      
> > > > > 
> > > > > That is too complicated.    
> > > > 
> > > > It's not much more code. I have attached a patch to do that. On
> > > > Win95, GetLastError returns ERROR_INVALID_FLAGS (1004), so I
> > > > assume that this method works on NT4 too.    
> > > 
> > > Uff, that is questionable. Win9x was always different and not
> > > followed the true NT based WinAPI.  
> > 
> > ERROR_INVALID_FLAGS is mentioned in WideCharToMultiByte docs,
> > including the old ones. That error code is used on WinXP when trying
> > CP_UTF8 with WC_ERR_INVALID_CHARS. To be very sure, someone would
> > need to test it on NT4.
> 
> WideCharToMultiByte indeed fails with ERROR_INVALID_FLAGS (1004) on NT4.
> The GetLastError method works in dirent: best-fit mapping is used on
> NT4.

Ok.

> About errno with unknown errors:
> 
>   - EIO sounds good if FindNextFileW fails. Even if it isn't a hardware
>     error, that function should rarely fail.
> 
>   - I have started to think that EIO might be a bit harsh in _wopendir.
>     I wonder if EINVAL would be better. MS CRTs seem to use EINVAL as a
>     fallback value (or at least EINVAL is very common if one tries to do
>     something weird). Another alternative is defaulting to ENOENT like
>     the old dirent does on GetFileAttributes failure.
> 
> The attached patches fix another silly copypaste error and simplify
> the second FindFirstFileW error check to use ENOENT even for unknown
> errors (that code is used only after rewinddir or seekdir). _wopendir's
> errno values weren't changed.
> 
> -- 
> Lasse Collin

ENOENT should be returned only in the case when the path really does not
exist. Not for generic / unknown readdir errors. If readdir returns
ENOENT then caller should use mkdir to create the non-existent path.


_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to