On Sunday 16 February 2025 12:23:56 Lasse Collin wrote:
> On 2025-02-15 Pali Rohár wrote:
> > IMHO, we should not use FILE_ATTRIBUTE_* for determining d_type
> > (except the FILE_ATTRIBUTE_DIRECTORY which defines directory). Those
> > are basically attributes which do not define file type in POSIX sense
> > at all. There are still few attributes not defined yet, which can be
> > added by MS in future. Also there are some attributes which are
> > already used by ReFS (e.g. SMR one) but not checked in your code.
> 
> Adding d_type makes sense only if its value can be determined cheaply.
> WIN32_FIND_DATAW provides attributes and the reparse point tag, so
> using those is practically free. I suppose any other method would have
> much more overhead and thus not worth using for d_type.
> 
> It's good enough if d_type can be known most of the time. That is, using
> DT_REG and DT_DIR for boring regular local files and directories, and
> leaving special cases as DT_UNKNOWN (symlinks can be DT_LNK). Is this
> doable with the data available in WIN32_FIND_DATAW?

FindFirstFileW() and FindNextFileW() should return consistent
information for both local and remote files. WIN32_FIND_DATAW should be
always enough for determining filetype used by WinAPI.

> > Ad reparse points, from application point of view, those are just
> > replacement of file/dir content during I/O operation. So I think that
> > existence of reparse point should not determinate d_type (except the
> > IO_REPARSE_TAG_SYMLINK which defines symlink).
> 
> get_d_type returns DT_UNKNOWN for all other reparse points than
> IO_REPARSE_TAG_SYMLINK. It might be more conservative than needed, but
> at least it shouldn't provide wrong information with reparse points.
> It also keeps the code simple.

Ok. And how then stat() determinates the type inodes for which readdir()
returned DT_UNKNOWN? I think that it does not use determination based on
attributes or reparse point tags (except directory flag).

> For comparison, Gnulib uses DT_LNK for all reparse points that don't
> have FILE_ATTRIBUTE_DIRECTORY. Cygwin does something more complicated
> with reparse points, starting here:
> 
> https://www.cygwin.com/cgit/newlib-cygwin/tree/winsup/cygwin/fhandler/disk_file.cc?id=d52d983e5b69457c706c34097a328a7678107b1a#n2370

Cygwin supports more Unix types and stores it in its own format.

Similarly MS POSIX subsystem and later MS Interix subsystem used its own
format (nowadays called SFU format) for storing special files, misusing
some attributes + file content. But for WinAPI applications, those files
were just ordinary "text files" and they were recognized only Interix
applications or by Linux SMB client.

And similarly MS NFS server 2012+ uses also its own format (now
everything in reparse point) for storing special files.

None of those formats are supported by msvcrt/ucrt and neither by
WinAPI functions.

So I think that mingw-w64 does not need such complicated logic and that
is why I think that checking long list of attributes is not necessary
and type can be returned for all files (instead of DT_UNKNOWN).

Most of the file attributes are settable by user and it is purely for
user or application purpose. So IMHO it is wrong to use attributes
(except the FILE_ATTRIBUTE_REPARSE_POINT and FILE_ATTRIBUTE_DIRECTORY)
for determining type of file used by WinAPI.

I think that this simple logic should be enough for determining type:

static unsigned char
get_d_type (DWORD attrs, DWORD reparse_tag)
{
    if ((attrs & FILE_ATTRIBUTE_REPARSE_POINT) && reparse_tag == 
IO_REPARSE_TAG_SYMLINK)
        return DT_LNK;
    else if (attrs & FILE_ATTRIBUTE_DIRECTORY)
        return DT_DIR;
    else
        return DT_REG;
}

Do you have an example when above code returns wrong DT_* information?

Because AFAIK, WinAPI does not provide any other file type which can be
available during directory listing.

Another possible file types in WinAPI are fifos and character/tty/...
devices, but those represents device files and they are not available on
disk storage at all, they are only in-memory, returned as handle by
other WinAPI functions and cannot be returned during some physical disk
directory listing (readdir).

> On 2025-02-16 Pali Rohár wrote:
> > FYI, in case you are interested in list of windows attributes, you can
> > look at my (updated) email which I sent to linux-fsdevel:
> > https://lore.kernel.org/linux-fsdevel/20250215233946.cxznczjjiu7vqazf@pali/
> 
> Thanks! Based on this, I removed FILE_ATTRIBUTE_EA from supported_attrs.
> I also omitted _PINNED and _UNPINNED. Now there are no HSM attributes in
> the list.
> 
> This change and three other small patches are attached.
> 
> By the way, mingw-w64 headers don't define FILE_ATTRIBUTE_VOLUME but
> FILE_ATTRIBUTE_STRICTLY_SEQUENTIAL is there.
> 
> There seems to be a bug when opening junctions with both old and new
> dirent. The old code fails at readdir with EINVAL. The new code fails
> at opendir with EACCES. I don't know how to fix it or how big issue it
> is in practice. Both old and new code work fine with directory symlinks.

Can you provide more details when this happens? I just have not caught
what is the reproducer of this case or when exactly it happens.

Valid reparse point with tag IO_REPARSE_TAG_MOUNT_POINT (mount point or
junction) should be fully transparent and mostly invisible for WinAPI
functions. And not only IO_REPARSE_TAG_MOUNT_POINT tag but any valid and
supported reparse point with Name Surrogate Bit in reparse point tag set.

So I think that such error here should be easy to fix, but I just do not
see in which function or code pattern the error happens.

Unless you somehow triggered bug in WinAPI / kernel32.dll and then it
would be unfixable.


_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to