On Sunday 16 February 2025 19:32:24 Lasse Collin wrote:
> On 2025-02-16 Pali Rohár wrote:
> > On Sunday 16 February 2025 12:23:56 Lasse Collin wrote:
> > > On 2025-02-15 Pali Rohár wrote:  
> > > > IMHO, we should not use FILE_ATTRIBUTE_* for determining d_type
> > > > (except the FILE_ATTRIBUTE_DIRECTORY which defines directory).
> > > > Those are basically attributes which do not define file type in
> > > > POSIX sense at all. There are still few attributes not defined
> > > > yet, which can be added by MS in future. Also there are some
> > > > attributes which are already used by ReFS (e.g. SMR one) but not
> > > > checked in your code.  
> > > 
> > > Adding d_type makes sense only if its value can be determined
> > > cheaply. WIN32_FIND_DATAW provides attributes and the reparse point
> > > tag, so using those is practically free. I suppose any other method
> > > would have much more overhead and thus not worth using for d_type.
> > > 
> > > It's good enough if d_type can be known most of the time. That is,
> > > using DT_REG and DT_DIR for boring regular local files and
> > > directories, and leaving special cases as DT_UNKNOWN (symlinks can
> > > be DT_LNK). Is this doable with the data available in
> > > WIN32_FIND_DATAW?  
> > 
> > FindFirstFileW() and FindNextFileW() should return consistent
> > information for both local and remote files. WIN32_FIND_DATAW should
> > be always enough for determining filetype used by WinAPI.
> 
> OK, good.
> 
> > > > Ad reparse points, from application point of view, those are just
> > > > replacement of file/dir content during I/O operation. So I think
> > > > that existence of reparse point should not determinate d_type
> > > > (except the IO_REPARSE_TAG_SYMLINK which defines symlink).  
> > > 
> > > get_d_type returns DT_UNKNOWN for all other reparse points than
> > > IO_REPARSE_TAG_SYMLINK. It might be more conservative than needed,
> > > but at least it shouldn't provide wrong information with reparse
> > > points. It also keeps the code simple.  
> > 
> > Ok. And how then stat() determinates the type inodes for which
> > readdir() returned DT_UNKNOWN? I think that it does not use
> > determination based on attributes or reparse point tags (except
> > directory flag).
> 
> I don't know what MS CRTs do.

Source code is available, so you can study it (if you want).

> In mingw-w64, stat() is a thin wrapper on
> top of _stat(). It treats junctions transparently as if they were
> directories, which sounds good.

All WinAPI functions are handling reparse point transparently. It is
because they are transparent in NT kernel, hence at syscall layer.

> (There's no lstat() to detect symlinks or readlink() to read them.)

I know. Maybe for future it would be nice to have lstat() call.
Implementation can be straightforward, open path as reparse point, check
if it s reparse point + retrieve reparse tag and then either call
fstat() with custom type (if it is reparse point) or call stat() (if it
is not reparse point).

> Gnulib's stat() has two methods. Both methods check for two attributes
> (FILE_ATTRIBUTE_DIRECTORY and FILE_ATTRIBUTE_READONLY) when determining
> st_mode. There is a comment wondering what to do about reparse points.
> So for Gnulib's stat(), reparse points are transparent. The code is in
> these files:
> 
>     https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/stat.c
>     https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/stat-w32.c

That comment just show that the function is incomplete. So it is not a
good idea to copy its logic as 1:1 (as it is incomplete).

> By the way, in mingw-w64-crt/stdio/_stat.c, _mingw_no_trailing_slash()
> calls malloc() but doesn't check if malloc() succeeded. It could be
> good to go through the codebase and look if there are more such malloc()
> usages. Earlier we spotted one in crtexe.c.

Yes. This is something which needs to checked. Another point for future :-)
Too many things which needs to be fixed.

> > So I think that mingw-w64 does not need such complicated logic and
> > that is why I think that checking long list of attributes is not
> > necessary and type can be returned for all files (instead of
> > DT_UNKNOWN).
> 
> OK, that would be good. I don't have any examples where your code
> wouldn't do the right thing. :-)
> 
> My point with the long list of attributes in get_d_type was to return
> DT_UNKNOWN if Microsoft added a new not-regular-file attribute some day,
> or if some application wants to handle reparse points specially (apps
> might be ported from POSIX with some extra code added on top to support
> Windows, so the end result can be a mix of both worlds). But I might
> have been over-thinking (wouldn't be the first time) or over-cautious.

I highly doubt that some new attribute in future would change regular
file to something totally different. That would break lot of things.

The way how new file types could be added is via reparse points. As this
is existing way and can do basically anything.

What could probably makes sense for DT_UNKNOWN is to return it for files
and dirs with reparse point attribute and reparse tag is not handled in
the function. This can address the idea about applications which wants
to handle reparse point specially, and also handles the AF_UNIX sockets
(mentioned below).

It is important to know that if you do not have installed NT kernel
driver for particular reparse point tag, then it is not to open file or
dir to which is attached reparse point with that tag. Hence without the
installed driver that file or dir with reparse point is not regular file
or dir. But rather something unknown for the system.

> Let's use your method. I attached a patch.
> 
> About DT_ macros that cannot appear in a directory listing: I didn't
> define DT_BLK in dirent.h because S_IFBLK seems to be a MinGW invention
> (to make it easier to port apps). Its value doesn't match glibc or
> *BSDs, so DT_BLK == S_IFBLK >> 12 wouldn't match glibc or *BSDs.

I think that "block device" is not available in neither msvcrt/ucrt nor
in WinAPI. So that is why there is no DT_BLK / S_IFBLK macro in ucrt
header files. I guess in mingw it is just for compile purposes of posix
applications.

> There is no DT_SOCK either (mingw-w64 doesn't have S_IFSOCK).

Ou, I forgot about this. Native AF_UNIX support is now available for
WinAPI. This was added to WinAPI just recently and probably it is not
supported in UCRT at all. So AF_UNIX files are detected as regular
files.

But for future it would be nice to extend mingw stat and readdir code to
detect AF_UNIX socket files and report them as DT_SOCK / S_IFSOCK.

WinAPI's AF_UNIX socket is stored as empty regular file with attached
reparse point with tag IO_REPARSE_TAG_AF_UNIX and empty reparse point
buffer.

> > > There seems to be a bug when opening junctions with both old and new
> > > dirent. The old code fails at readdir with EINVAL. The new code
> > > fails at opendir with EACCES. I don't know how to fix it or how big
> > > issue it is in practice. Both old and new code work fine with
> > > directory symlinks.  
> > 
> > Can you provide more details when this happens? I just have not caught
> > what is the reproducer of this case or when exactly it happens.
> 
> I was wrong. If I create a dir and point a new junction to it, it works.
> 
> The weird thing is that it's not enough that the destination of the
> junction is accessible. For example, "C:\Documents and Settings" points
> to "C:\Users". The dirent implementations work on C:\Users but not on
> the junction. The same is true when using "dir" in Command Prompt.
> 
> One can access "C:\Documents and Settings\SomeUserName" just fine if
> one has permission to access C:\Users\SomeUserName. It's just the root
> of the junction that doesn't allow its contents listed. So it is a
> permission issue as the error message says.
> 
> -- 
> Lasse Collin

Ok, so it is just a normal EACCES scenario.


_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to