On 29/11/19 19:48 +0000, Jonathan Wakely wrote:
On 29/11/19 17:45 +0000, Jonathan Wakely wrote:
On 15/09/19 15:39 -0400, Tom Honermann wrote:
This series of patches provides an implementation of the changes
for C++ proposal P1423R3 [1].
These changes do not impact default libstdc++ behavior for C++17
and earlier; they are only active for C++2a or when the -fchar8_t
option is specified.
Tested x86_64-linux.
Patch 1: Decouple constraints for u8path from path constructors.
Patch 2: Update __cpp_lib_char8_t feature test macro value, add
deleted operators, update u8path.
Patch 3: Updates to existing tests.
Patch 4: New tests.
Tom.
[1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1423r3.html
It took a while, but I've committed these four patches, with just some
minor whitespace changes and changelog tweaks.
Running the new tests revealed a latent bug on Windows, where
experimental::filesystem::u8path(const Source&) assumed the input
was an iterator over a NTCTS. That worked for a const char* but not a
std::string or experimental::string_view.
The attached patch fixes that (and simplifies the #if and if-constexpr
conditions for Windows) but there's a remaining bug. Constructing a
experimental::filesystem::path from a char8_t string doesn't do the
right thing on Windows, so these cases fails:
fs::path p(u8"\xf0\x9d\x84\x9e");
VERIFY( p.u8string() == u8"\U0001D11E" );
p = fs::u8path(u8"\xf0\x9d\x84\x9e");
VERIFY( p.u8string() == u8"\U0001D11E" );
It works correctly for std::filesystem::path, just not the TS version.
I think this is the fix needed for the TS code:
--- a/libstdc++-v3/include/experimental/bits/fs_path.h
+++ b/libstdc++-v3/include/experimental/bits/fs_path.h
@@ -765,7 +765,14 @@ namespace __detail
{
#ifdef _GLIBCXX_USE_CHAR8_T
if constexpr (is_same<_CharT, char8_t>::value)
- return _S_wconvert((const char*)__f, (const char*)__l, true_type());
+ {
+ const char* __f2 = (const char*)__f;
+ const char* __l2 = (const char*)__l;
+ std::wstring __wstr;
+ std::codecvt_utf8_utf16<wchar_t> __wcvt;
+ if (__str_codecvt_in_all(__f2, __l2, __wstr, __wcvt))
+ return __wstr;
+ }
else
#endif
{
The current code uses std::codecvt<wchar_t, char, mbstate_t> but when
we know the input is UTF-8 encoded we should use codecvt_utf8_utf16
(which is what the C++17 code already does for char8_t input).
I'll add that the patch I'm testing.