On 2025-02-18 Pali Rohár wrote:
> On Tuesday 18 February 2025 23:32:54 Lasse Collin wrote:
> > On 2025-02-18 Pali Rohár wrote:  
> > > Just one test case, can you check that your new readdir()
> > > function is working correctly on these two paths?
> > > 
> > > \\?\GLOBALROOT\Device\Harddisk0\Partition1\
> > > \\?\GLOBALROOT\Device\HardiskVolume1\
> > 
> > These paths don't work with the old dirent. opendir fails with
> > ENOENT.  
> 
> Perfect, this is then nice improvement, that in new version it is
> working.

I had made a mistake. I had tested the new code with and without \ at
the end, but I had tested the old code only without. The old code does
work when there is \ at the end. I hope it is OK that the new code
works without \ too, even though I guess it's not strictly correct.
Otherwise the GetFileAttributes call from the old code needs to be
restored to the new version.

A few other tiny things:

(1)
I tested on a directory that has an unsupported reparse tag.
FindFirstFileW fails with ERROR_CANT_ACCESS_FILE (1920) which currently
becomes EIO. The old dirent code fails with EINVAL at readdir (not at
opendir).

I guess EIO isn't the best. Directory symlinks and junctions whose
targets don't exist make opendir fail with ENOENT, so I guess it's
appropriate here too.

A non-directory with an unsupported reparse tag or AF_UNIX were already
ENOTDIR.

(2)
ERROR_CANT_RESOLVE_FILENAME (1921) is currently mapped to ELOOP. The
error 1921 is possible in situations other than symlink loops too, for
example, a junction with weirdly broken substitute path.

stat() uses ENOENT in these situations. open() uses EINVAL (if the
reparse point isn't a directory). I suppose EINVAL is a generic
fallback value in MS CRTs, because EINVAL seems to occur with so many
types of errors.

MSVCRT's strerror(ELOOP) returns "Unknown error". UCRT has a proper
message for ELOOP.

I'm unsure which is better, ELOOP or ENOENT. Probably it doesn't matter
much in practice.

(3)
I found old Microsoft docs on the web which, if they can trusted, say
that WC_NO_BEST_FIT_CHARS isn't available on Win95 and NT4. So in the
current form, the new dirent code requires Windows 2000 or later. From
earlier discussions I got an impression that as long as it works on
WinXP it's good enough, so I only updated the comments.

    https://www.tenouk.com/ModuleG.html

(4)
WideCharToMultiByte docs say that with CP_UTF8 the only supported flag
is WC_ERR_INVALID_CHARS and the last argument must be NULL. It's true on
Win7, but on recent Win10 it works. It's logical because that
combination works with CP_ACP when ACP is UTF-8. This feature seems to
be undocumented, so it's still best to not take advantage of it.

(5)
In setlocale docs section "UTF-8 support"[1], the last paragraph says
that UTF-8 locales are possible on Windows versions older than 10 with
app-local deployment or static linking of UCRT. I hope this is
irrelevant in mingw-w64 context. In [2] section "Central deployment",
the UCRT versions listed for pre-Win10 are too old to support UTF-8
locales. The last UCRT redistributable for WinXP has 10.0.10586.15.

WinXP doesn't support WC_ERR_INVALID_CHARS (Vista does). If someone
managed to use a new enough UCRT on WinXP *and* use a UTF-8 locale,
then the new dirent code doesn't work.

[1] 
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170#utf-8-support

[2] 
https://learn.microsoft.com/en-us/cpp/windows/universal-crt-deployment?view=msvc-170#central-deployment

I attached a patch that adds ERROR_CANT_ACCESS_FILE (1920) and tweaks a
few comments. I didn't change ELOOP. If nothing above made you think
that something else should be changed, then this should finally be the
final version. :-) Thanks!

-- 
Lasse Collin
From dddeeb3d77884970a037856934554097290711cf Mon Sep 17 00:00:00 2001
From: Lasse Collin <lasse.col...@tukaani.org>
Date: Sat, 22 Feb 2025 15:00:55 +0200
Subject: [PATCH] ... Handle ERROR_CANT_ACCESS_FILE

---
 mingw-w64-crt/misc/dirent.c    | 15 +++++++++++----
 mingw-w64-headers/crt/dirent.h |  3 ++-
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/mingw-w64-crt/misc/dirent.c b/mingw-w64-crt/misc/dirent.c
index 3faca481b..c9fbcef3e 100644
--- a/mingw-w64-crt/misc/dirent.c
+++ b/mingw-w64-crt/misc/dirent.c
@@ -23,7 +23,7 @@
  *   - added d_type to struct dirent and struct _wdirent
  *   - improved error handling
  *   - added API docs into dirent.h
- *   - Windows 95/98/ME is no longer supported
+ *   - Windows 95/98/ME and NT4 are no longer supported
  */
 
 #ifndef WIN32_LEAN_AND_MEAN
@@ -239,6 +239,7 @@ _wopendir (const wchar_t *path)
          case ERROR_BAD_PATHNAME:
          case ERROR_BAD_NETPATH:
          case ERROR_BAD_NET_NAME:
+         case ERROR_CANT_ACCESS_FILE:
            /* In addition to the obvious reason, ERROR_PATH_NOT_FOUND
             * may occur also if the search pattern is too long:
             * 32767 wide chars including the \0 for a long path aware app,
@@ -255,7 +256,11 @@ _wopendir (const wchar_t *path)
             * or if the server doesn't support file sharing.
             *
             * ERROR_BAD_NET_NAME occurs if the server can be contacted but
-            * the share doesn't exist. */
+            * the share doesn't exist.
+            *
+            * ERROR_CANT_ACCESS_FILE occurs with directories that have
+            * an unhandled reparse point tag. Treat them the same way as
+            * directory symlinks and junctions whose targets don't exist. */
            err = ENOENT;
            break;
 
@@ -464,6 +469,7 @@ prepare_next_entry (DIR *dirp)
              case ERROR_BAD_PATHNAME:
              case ERROR_BAD_NETPATH:
              case ERROR_BAD_NET_NAME:
+             case ERROR_CANT_ACCESS_FILE:
              case ERROR_DIRECTORY:
              case ERROR_INVALID_FUNCTION:
              case ERROR_NOT_FOUND:
@@ -591,13 +597,14 @@ readdir_impl (DIR *dirp, BOOL fallback8dot3)
    *
    *   - CP_ACP and CP_OEMCP support WC_NO_BEST_FIT_CHARS even when those
    *     code pages are set to UTF-8. Lossy conversion is detected via the
-   *     last argument (BOOL*).
+   *     last argument (BOOL*). This works on Windows 2000 and later. On
+   *     Windows 10, this may work with CP_UTF8 too, but it's undocumented.
    *
    *   - CP_UTF8 requires WC_ERR_INVALID_CHARS, and the last argument must be
    *     NULL. If the filename contains unpaired surrogates (invalid UTF-16),
    *     the return value will be 0. WC_ERR_INVALID_CHARS only works on
    *     Windows Vista and later, but CP_UTF8 is only used with UTF-8 locales
-   *     which are only supported on Windows 10 and later.
+   *     which are only supported with new enough UCRT.
    *
    * d_name is big enough that conversion cannot run out of buffer space
    * with double-byte character sets or UTF-8.
diff --git a/mingw-w64-headers/crt/dirent.h b/mingw-w64-headers/crt/dirent.h
index 7a27f725c..a6f9aeee3 100644
--- a/mingw-w64-headers/crt/dirent.h
+++ b/mingw-w64-headers/crt/dirent.h
@@ -85,7 +85,8 @@ typedef struct __dirent_DIR DIR;
  *               Windows reports as ERROR_CANT_RESOLVE_FILENAME.
  *     EACCES    Access denied.
  *     EIO       Unknown error, possibly an I/O error.
- *     ENOSYS    This dirent implementation doesn't work on Windows 95/98/ME.
+ *     ENOSYS    This dirent implementation works on Windows 2000 and later.
+ *               Windows 95/98/ME and NT4 are not supported.
  */
 DIR* __cdecl __MINGW_NOTHROW opendir (const char*);
 
-- 
2.48.1

_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to