On Monday 27 January 2025 16:49:26 Lasse Collin wrote:
> Another behavior difference happens with invalid multibyte strings.
> I tested with UTF-8 in application manifest. A file named L"_\uFFFD_"
> exists.
> 
> The UCRT functions fail if given invalid UTF-8:
> 
>     fopen("_\x80_", "r");
>     _open("_\x80_", O_RDONLY);
>     // _findfirst fails too
> 
> GetLastError() returns ERROR_NO_UNICODE_TRANSLATION.
> 
> Win32 API functions convert the invalid bytes to U+FFFD and then access
> the resulting filename, so these succeed:
> 
>     GetFileAttributesA("_\x80_");
> 
>     WIN32_FIND_DATAA wfd;
>     FindFirstFileA("_\x80_", &wfd);
>     // wfd.cFileName contains "_\uFFFD_" in UTF-8.
> 
> Listing files in a directory works too, that is,
> FindFirstFileA("_\x80_directory\\*", &wfd) lists files in
> "_\ufffd_directory".
> 
> I suppose dirent should follow the UCRT behavior.

I agree with you. Autoconverting of 0x0080 to 0xFFFD is a bad idea.

> This means using MB_ERR_INVALID_CHARS with MultiByteToWideChar().
> 
> * * *
> 
> It was pointed out that using FindFirstFileExW() can improve speed if
> one tells it to not list 8.3 names. I didn't see a difference on SSD
> (or well, actually cached data in RAM). But 8.3 names are needed if
> there was _readdir_8dot3() which would fall back to the 8.3 name if
> conversion of the long name fails. I suppose it's a more sensible
> fallback for some apps than imaginary names from best-fit mapping.
> 
> -- 
> Lasse Collin

I think that for excluding 8.3 names you mean to use FindExInfoBasic
level instead of FindExInfoStandard when doing FindFirstFileExW().

Level FindExInfoBasic is supported since Windows 7 and I think that
readdir() could be still useful also on Windows XP.


_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to