https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111244

            Bug ID: 111244
           Summary: std::filesystem::path encoding mismatches locale on
                    Windows
           Product: gcc
           Version: 13.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: thiago at kde dot org
  Target Milestone: ---

Test:
$ cat fstest.cpp 
#include <filesystem>
#include <stdio.h>

int main(int argc, char **argv)
{
    for (int i = 1; i < argc; ++i) {
        std::filesystem::path p(argv[i]);
        if (std::filesystem::exists(p)) {
            printf("%s %llu\n", argv[1], (unsigned long
long)std::filesystem::file_size(p));
        } else {
            printf("%s does not exist\n", argv[1]);
        }
    }
}
$ touch filæ
$ g++ fstest.cpp
$ ./a.out fstest.cpp filæ

On Linux (and any other Unix):
fstest.cpp 377
fstest.cpp 0

On Windows with libc++ or MS STL:
fstest.cpp 377
fstest.cpp 0

On Windows with libstdc++:
fstest.cpp 377
terminate called after throwing an instance of
'std::filesystem::__cxx11::filesystem_error'
  what():  filesystem error: Cannot convert character sequence: Illegal byte
sequence

This is caused by std::filesystem::path interpreting the input as UTF-8. On
Windows, it's not; it must be decoded using the locale codec. 

Strictly speaking, the same should apply to the conversion to Unicode on Unix
systems too, but a) they're almost all UTF-8 these days, so the corner cases
may be ignored by a policy decision and b) the mismatch of input does not lead
to inability to refer to files by fs::path alone.

Reply via email to