Re: [Mingw-w64-public] dirent changes

Pali Rohár Tue, 18 Feb 2025 11:52:39 -0800

On Tuesday 18 February 2025 18:48:40 Lasse Collin wrote:
> On 2025-02-17 Pali Rohár wrote:
> > On Monday 17 February 2025 17:37:17 Lasse Collin wrote:
> > > I suppose lstat() shouldn't follow any reparse points.  
> > 
> > That is a question for which I do not have answer.
> 
> One would need to figure out what is the least surprising behavior.
> 
> > Why should follow:
> > - ensure that lstat() and stat() returns same data for file type which
> >   is not symbolic link (this is somehow what POSIX applications may
> >   except)
> > - ensure that lstat() for mount points returns inode where mount point
> >   points and not the inode of the underlying mount point itself (this
> >   is again POSIX behavior)
> 
> These are very good reasons.
> 
> > Why should NOT follow:
> > - some reparse point, specially those of name surrogate type,
> >   represents different file (possible remote) and lstat could return
> >   the real local information
> > - reparse points for which driver is not installed cannot be opened in
> >   "follow mode"
> 
> If lstat() did follow and a driver is not installed, lstat() could fail
> with a custom errno value, that is, something else than ENOENT. Perhaps
> ENXIO could be OK. Maybe stat() could do the same.
> 
> > > Maybe there could
> > > be non-POSIX macros like S_IFRPP and S_ISRPP(m) to indicate any
> > > other reparse point than symlink or socket. Some OSes have extra
> > > macros but I don't know if the idea makes sense here.  
> > 
> > Unless there is a real needs from applications, I would rather not add
> > any new non-POSIX/non-GNU macros.
> 
> Right, better to not add anything special until there's a clear need
> for it.
> 
> If something new was introduced, one variation is to make lstat()
> follow reparse points except symlinks and sockets, and add _rppstat()
> that returns S_IFRPP for other reparse points. The idea would be that
> POSIXish apps that care about reparse points could be adapted with
> smaller changes. I definitely am not suggesting to do this, I'm just
> thinking out aloud. :-)
> 
> > > If UCRT added S_IFLNK and S_IFSOCK but used different constants than
> > > mingw-w64, that would be a mess. I'm not proposing any S_ macro
> > > additions in this email, I'm just thinking out aloud.  
> > 
> > IIRC UCRT has not added support yet. But sure it is not a good idea to
> > add something incompatible.
> 
> Microsoft already uses the same values for S_IFxxx as everyone else. It
> would be a big surprise if they did differently, for example, with
> S_IFSOCK.
> 
> If they added S_IFBLK, there's a small chance that they could look at
> mingw-w64, and then adopt the weird 0x3000. I wonder how much breakage
> it would cause if S_IFBLK was changed in mingw-w64, and if that risk is
> worth it to get rid of this small portability trap. I *guess* that both
> negative and postive effects cannot affect many packages.
> 
> I quickly tried to find examples, and didn't find many. The tcl package
> has ucrt64/include/tcl8.6/tcl-private/win/tclWinPort.h which defines
> S_IFBLK to 0x6000 if S_IFBLK isn't already defined. It was added in
> 2016. The comment refers to "Linux and MingW" but the author might have
> missed the odd value. Thus MSVC and mingw-w64 built tcl use different
> S_IFBLK value.
> 
>     https://packages.msys2.org/packages/mingw-w64-ucrt-x86_64-tcl
> 
>     
> https://github.com/tcltk/tcl/commit/48e3873f199817b52d54a4d802966b508917d200
> 
> Python's stat.S_IFBLK and stat.S_ISBLK(mode) use the values from
> <sys/stat.h>. Thus, MSYS2-Python uses 0x6000 and UCRT64-Python
> uses 0x3000. I suppose MSVC-Python uses 0x6000 because it's the
> fallback value when S_IFBLK isn't already defined. The comment about
> the constants suggests that no one noticed that mingw-w64 has an
> unusual value:
> 
>     
> https://github.com/python/cpython/blob/b93b7e566e5a4efe7f077af2083140e50bd2b08f/Modules/_stat.c#L55
> 
> 0x6000 is used in stat.py too, but due to the import at the bottom,
> the values from _stat.c are used instead.
> 
>     
> https://github.com/python/cpython/blob/b93b7e566e5a4efe7f077af2083140e50bd2b08f/Lib/stat.py#L33
> 
> You can try the following Python code:
> 
>     import os
>     import stat
> 
>     mode = os.stat(".").st_mode
>     print("0x%x" % mode)
>     print(stat.S_ISDIR(mode))
>     print(stat.S_ISBLK(mode))
> 
>     mode = stat.S_IFBLK
>     print("0x%x" % mode)
>     print(stat.S_ISBLK(mode))
> 
> I'm not brave enough to actually propose that mingw-w64 should change
> S_IFBLK to 0x6000, but I still wonder if keeping the current value does
> more bad than good.
> 
> > I would propose: Do not add DT_BLK for now. And once we figure out
> > that it is really useful, we can add it. So we do not need to think
> > which value DT_BLK should have (unless there is no doubt that it
> > should have numeric value XXXX).
> 
> I agree. It shouldn't be a disaster if S_IFBLK and DT_BLK values aren't
> in sync, but since DT_BLK isn't actually required, it's better to not
> define it for now.
> 
> > > Are you certain that IO_REPARSE_TAG_AF_UNIX is the right tag for
> > > Win32 AF_UNIX?  
> > 
> > Yes, I'm sure.
> 
> Thanks! :-)
> 
> > >   - If adding DT_SOCK, is this OK:
> > > 
> > >     static unsigned char
> > >     get_d_type (DWORD attrs, DWORD reparse_tag)
> > >     {
> > >       if (attrs & FILE_ATTRIBUTE_REPARSE_POINT)
> > >         {
> > >           switch (reparse_tag)
> > >             {
> > >               case IO_REPARSE_TAG_SYMLINK:
> > >                 return DT_LNK;
> > > 
> > >               case IO_REPARSE_TAG_AF_UNIX:
> > >                 return DT_SOCK;
> > > 
> > >               default:
> > >                 return DT_UNKNOWN;
> > >             }
> > >         }
> > > 
> > >       return (attrs & FILE_ATTRIBUTE_DIRECTORY) ? DT_DIR : DT_REG;
> > >     }  
> > 
> > For me this looks good.
> 
> Thanks! I attached another fixup patch. I shortened seekdir doc in
> dirent.h a bit too. It doesn't need to be quite so specific about the
> implementation details.
> 
> > > I think NAME_MAX should be increased at least if modern MSVC doesn't
> > > define it. Making NAME_MAX visible with _POSIX_C_SOURCE etc. should
> > > be simple, one just needs to be careful that all relevant macros
> > > are listed correctly.  
> > 
> > I do not know about this one point.
> 
> I have changed my mind. Let's not touch NAME_MAX. It might even be good
> to keep it undefined as it is now (I suppose very few apps define the
> non-standard _POSIX_ macro).
> 
> (1) Apps can misuse NAME_MAX. For example, in an earlier email I linked
>     to binutils bug about NAME_MAX and mingw-w64. Seems that binutils
>     uses NAME_MAX in exactly one place, and there it assumes that any
>     filename up to NAME_MAX ASCII chars can be *created*. If other apps
>     have similar assumptions, increasing NAME_MAX can create problems.
> 
>     
> https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=binutils/dlltool.c;h=ef21423b06d6b30956dd44adee9df8325bb48aa0;hb=HEAD#l4002
> 
> (2) Even if NAME_MAX was defined on a system, there might not be any
>     specific limit in reality. It's one reason why readdir_r() is
>     deprecated. For example, NAME_MAX is 255 on Linux, but on FAT32
>     mounted with iocharset=utf8, one can have 765 bytes long filenames
>     because the limit is 255 UTF-16 code units. Most apps don't care
>     about NAME_MAX, thus such names work normally.
> 
>     Linux actually claims 2 * 765 for NAME_MAX on FAT32:
> 
>         $ getconf NAME_MAX /mnt
>         1530
> 
>     I don't think it's possible on FAT32. It's just a big enough value
>     calculated as FAT_LFN_LEN * NLS_MAX_CHARSET_SIZE.
> 
> (3) In POSIX.1-2024, the second paragraph under "Path Variable Names"
>     says that macros in <limits.h> shall be omitted in some cases. That
>     doesn't really apply to the case in mingw-w64, and there's no
>     pathconf() in mingw-w64, but the point is that apps shouldn't
>     assume that NAME_MAX exists.
> 
>     
> https://pubs.opengroup.org/onlinepubs/009695399/basedefs/limits.h.html#tag_13_24_03_02
> 
> > Some more information. Visual C++ CRT file stdio.h defines following
> > constant:
> > 
> > #define FILENAME_MAX 260
> > 
> > And it defines it since the first Visual C++ 1.0 version for 32-bit NT
> > up to the last version with UCRT support.
> > 
> > So I guess that value 260 comes from this FILENAME_MAX macro.
> > 
> > But what is point of FILENAME_MAX I have no idea, it is not referenced
> > or used in CRT header files.
> 
> FILENAME_MAX is at least in C99 and newer standards. Its definition
> is kind of relaxed. See C99 7.19.1 or C11/C17/C23 7.21.1. Defining
> FILENAME_MAX to the same value as MAX_PATH sounds sensible because
> MAX_PATH or _MAX_PATH is a limit in a few CRT APIs.
> 
> Size of d_name is derived from WIN32_FIND_DATAW.cFileName[MAX_PATH] in
> the new dirent code (_finddata_t.filename[_MAX_PATH] in the old code).
> I suppose it's more logical to refer to MAX_PATH in the dirent.h
> comments.
> 
> -- 
> Lasse Collin


I do not have any other comments for this. Looks good for me.

I was just surprised that FILENAME_MAX is in C99, I did not know about
it. I thought that it is MS invention, together with that value 260. Now
I checked and this macro is available since C89. So it is really old.

Just one test case, can you check that your new readdir() function is
working correctly on these two paths?

\\?\GLOBALROOT\Device\Harddisk0\Partition1\
\\?\GLOBALROOT\Device\HardiskVolume1\

If you are not familiar with NT paths encoded in WinAPI paths, both
should refer to WinAPI C:\ (or whatever is the first partition of the
first disk in the system).


_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Re: [Mingw-w64-public] dirent changes

Reply via email to