On 2025-02-16 Pali Rohár wrote: > On Sunday 16 February 2025 12:23:56 Lasse Collin wrote: > > On 2025-02-15 Pali Rohár wrote: > > > IMHO, we should not use FILE_ATTRIBUTE_* for determining d_type > > > (except the FILE_ATTRIBUTE_DIRECTORY which defines directory). > > > Those are basically attributes which do not define file type in > > > POSIX sense at all. There are still few attributes not defined > > > yet, which can be added by MS in future. Also there are some > > > attributes which are already used by ReFS (e.g. SMR one) but not > > > checked in your code. > > > > Adding d_type makes sense only if its value can be determined > > cheaply. WIN32_FIND_DATAW provides attributes and the reparse point > > tag, so using those is practically free. I suppose any other method > > would have much more overhead and thus not worth using for d_type. > > > > It's good enough if d_type can be known most of the time. That is, > > using DT_REG and DT_DIR for boring regular local files and > > directories, and leaving special cases as DT_UNKNOWN (symlinks can > > be DT_LNK). Is this doable with the data available in > > WIN32_FIND_DATAW? > > FindFirstFileW() and FindNextFileW() should return consistent > information for both local and remote files. WIN32_FIND_DATAW should > be always enough for determining filetype used by WinAPI.
OK, good. > > > Ad reparse points, from application point of view, those are just > > > replacement of file/dir content during I/O operation. So I think > > > that existence of reparse point should not determinate d_type > > > (except the IO_REPARSE_TAG_SYMLINK which defines symlink). > > > > get_d_type returns DT_UNKNOWN for all other reparse points than > > IO_REPARSE_TAG_SYMLINK. It might be more conservative than needed, > > but at least it shouldn't provide wrong information with reparse > > points. It also keeps the code simple. > > Ok. And how then stat() determinates the type inodes for which > readdir() returned DT_UNKNOWN? I think that it does not use > determination based on attributes or reparse point tags (except > directory flag). I don't know what MS CRTs do. In mingw-w64, stat() is a thin wrapper on top of _stat(). It treats junctions transparently as if they were directories, which sounds good. (There's no lstat() to detect symlinks or readlink() to read them.) Gnulib's stat() has two methods. Both methods check for two attributes (FILE_ATTRIBUTE_DIRECTORY and FILE_ATTRIBUTE_READONLY) when determining st_mode. There is a comment wondering what to do about reparse points. So for Gnulib's stat(), reparse points are transparent. The code is in these files: https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/stat.c https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/stat-w32.c By the way, in mingw-w64-crt/stdio/_stat.c, _mingw_no_trailing_slash() calls malloc() but doesn't check if malloc() succeeded. It could be good to go through the codebase and look if there are more such malloc() usages. Earlier we spotted one in crtexe.c. > So I think that mingw-w64 does not need such complicated logic and > that is why I think that checking long list of attributes is not > necessary and type can be returned for all files (instead of > DT_UNKNOWN). OK, that would be good. I don't have any examples where your code wouldn't do the right thing. :-) My point with the long list of attributes in get_d_type was to return DT_UNKNOWN if Microsoft added a new not-regular-file attribute some day, or if some application wants to handle reparse points specially (apps might be ported from POSIX with some extra code added on top to support Windows, so the end result can be a mix of both worlds). But I might have been over-thinking (wouldn't be the first time) or over-cautious. Let's use your method. I attached a patch. About DT_ macros that cannot appear in a directory listing: I didn't define DT_BLK in dirent.h because S_IFBLK seems to be a MinGW invention (to make it easier to port apps). Its value doesn't match glibc or *BSDs, so DT_BLK == S_IFBLK >> 12 wouldn't match glibc or *BSDs. There is no DT_SOCK either (mingw-w64 doesn't have S_IFSOCK). > > There seems to be a bug when opening junctions with both old and new > > dirent. The old code fails at readdir with EINVAL. The new code > > fails at opendir with EACCES. I don't know how to fix it or how big > > issue it is in practice. Both old and new code work fine with > > directory symlinks. > > Can you provide more details when this happens? I just have not caught > what is the reproducer of this case or when exactly it happens. I was wrong. If I create a dir and point a new junction to it, it works. The weird thing is that it's not enough that the destination of the junction is accessible. For example, "C:\Documents and Settings" points to "C:\Users". The dirent implementations work on C:\Users but not on the junction. The same is true when using "dir" in Command Prompt. One can access "C:\Documents and Settings\SomeUserName" just fine if one has permission to access C:\Users\SomeUserName. It's just the root of the junction that doesn't allow its contents listed. So it is a permission issue as the error message says. -- Lasse Collin
From 284e5cfb3726e27c9282d28d4ada74e8eef2138e Mon Sep 17 00:00:00 2001 From: Lasse Collin <lasse.col...@tukaani.org> Date: Sun, 16 Feb 2025 18:51:52 +0200 Subject: [PATCH 8/8] ... Simplify get_d_type() --- mingw-w64-crt/misc/dirent.c | 41 +++------------------------------- mingw-w64-headers/crt/dirent.h | 23 +++++++++---------- 2 files changed, 13 insertions(+), 51 deletions(-) diff --git a/mingw-w64-crt/misc/dirent.c b/mingw-w64-crt/misc/dirent.c index 6a60559c3..f11e2db83 100644 --- a/mingw-w64-crt/misc/dirent.c +++ b/mingw-w64-crt/misc/dirent.c @@ -382,48 +382,13 @@ opendir (const char *path) static unsigned char get_d_type (DWORD attrs, DWORD reparse_tag) { - /* - * If we are unsure about the type, we should return DT_UNKNOWN. - * The d_type member is useful only if its value can be trusted - * to be correct when it's not DT_UNKNOWN. - * - * The following were omitted from supported_attrs: - * FILE_ATTRIBUTE_DEVICE - * FILE_ATTRIBUTE_OFFLINE - * FILE_ATTRIBUTE_VIRTUAL - * FILE_ATTRIBUTE_EA (overlaps FILE_ATTRIBUTE_RECALL_ON_OPEN) - * FILE_ATTRIBUTE_PINNED - * FILE_ATTRIBUTE_UNPINNED - * FILE_ATTRIBUTE_RECALL_ON_OPEN - * FILE_ATTRIBUTE_RECALL_ON_DATA_ACCESS - * FILE_ATTRIBUTE_STRICTLY_SEQUENTIAL - */ - unsigned int supported_attrs = FILE_ATTRIBUTE_READONLY | - FILE_ATTRIBUTE_HIDDEN | - FILE_ATTRIBUTE_SYSTEM | - FILE_ATTRIBUTE_DIRECTORY | - FILE_ATTRIBUTE_ARCHIVE | - FILE_ATTRIBUTE_NORMAL | - FILE_ATTRIBUTE_TEMPORARY | - FILE_ATTRIBUTE_SPARSE_FILE | - FILE_ATTRIBUTE_REPARSE_POINT | - FILE_ATTRIBUTE_COMPRESSED | - FILE_ATTRIBUTE_NOT_CONTENT_INDEXED | - FILE_ATTRIBUTE_ENCRYPTED | - FILE_ATTRIBUTE_INTEGRITY_STREAM | - FILE_ATTRIBUTE_NO_SCRUB_DATA; - - if (attrs & ~supported_attrs) - { - return DT_UNKNOWN; - } - /* Check for a reparse point before checking if it is a directory. * A reparse point can also have the directory attribute. This * includes "directory symlinks" too. */ - if (attrs & FILE_ATTRIBUTE_REPARSE_POINT) + if ((attrs & FILE_ATTRIBUTE_REPARSE_POINT) && + reparse_tag == IO_REPARSE_TAG_SYMLINK) { - return (reparse_tag == IO_REPARSE_TAG_SYMLINK) ? DT_LNK : DT_UNKNOWN; + return DT_LNK; } if (attrs & FILE_ATTRIBUTE_DIRECTORY) diff --git a/mingw-w64-headers/crt/dirent.h b/mingw-w64-headers/crt/dirent.h index 241c530b9..06d95650a 100644 --- a/mingw-w64-headers/crt/dirent.h +++ b/mingw-w64-headers/crt/dirent.h @@ -32,23 +32,21 @@ extern "C" { /* Values for d_type: * - * - DT_UNKNOWN is possible even for regular files and directories - * in some cases. - * - * - DT_DIR and DT_REG are used when the entry definitely is - * a regular file or directory. + * - DT_DIR and DT_REG are used when the entry is a directory or + * a regular file. * * - DT_LNK is only used for reparse points that have the tag - * IO_REPARSE_TAG_SYMLINK. Other reparse points are DT_UNKNOWN. + * IO_REPARSE_TAG_SYMLINK. Other reparse points are DT_DIR or DT_REG. * - * - Other values aren't used, but DT_CHR and DT_FIFO are defined still. + * - Other values aren't possible in a directory listing. They are only + * defined for compatibility. * * The constants match glibc and *BSDs: * - * - DT_DIR == (S_IFDIR >> 12) - * DT_REG == (S_IFREG >> 12) - * DT_FIFO == (S_IFIFO >> 12) + * - DT_FIFO == (S_IFIFO >> 12) * DT_CHR == (S_IFCHR >> 12) + * DT_DIR == (S_IFDIR >> 12) + * DT_REG == (S_IFREG >> 12) * * - There is no S_IFLNK on Windows. However, on other systems, * S_IFLNK == (S_IFREG | S_IFCHR), and S_IFCHR has the same @@ -56,12 +54,11 @@ extern "C" { * constant is used here as on the other systems. */ #define DT_UNKNOWN 0 +#define DT_FIFO 1 +#define DT_CHR 2 #define DT_DIR 4 #define DT_REG 8 #define DT_LNK 10 -/* These following aren't used. They are only defined for compatibility. */ -#define DT_FIFO 1 -#define DT_CHR 2 struct dirent { -- 2.48.1
_______________________________________________ Mingw-w64-public mailing list Mingw-w64-public@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mingw-w64-public