It took a while, but I was finally happy with this v4 patch, so I pushed it to trunk. Then I noticed silly mistake in the new test, which I'll fix shortly.
Compared to v3 sent two weeks ago, the main change here is the addition of a mutex to the __encoding facet, so that two formatters using the same locale won't conflict when using its iconv descriptor to convert strings. I've also moved the actual iconv conversion into a member function of the __encoding facet, so that it's better encapsulated. That will also enable some future changes I have planned, to be shared soon. Tested x86_64-linux, sparc-solaric11.4, x86_64-freebsd14, and powerpc-aix7.3. Pushed to trunk, along with the rest of the series that the v3 patch was part of. -- >8 -- This implements the C++23 paper P2419R2 (Clarify handling of encodings in localized formatting of chrono types). The requirement is that when the literal encoding is "a Unicode encoding form" and the formatting locale uses a different encoding, any locale-specific strings such as "août" for std::chrono::August should be converted to the literal encoding. Using the recently-added std::locale::encoding() function we can check the locale's encoding and then use iconv if a conversion is needed. Because nl_langinfo_l and iconv_open both allocate memory, a naive implementation would perform multiple allocations and deallocations for every snippet of locale-specific text that needs to be converted to UTF-8. To avoid that, a new internal locale::facet is defined to store the text_encoding and an iconv_t descriptor, which are then cached in the formatting locale. This requires access to the internals of a std::locale object in src/c++20/format.cc, so that new file needs to be compiled with -fno-access-control, as well as -std=gnu++26 in order to use std::text_encoding. Because the new std::text_encoding and std::locale::encoding() symbols are only in the libstdc++exp.a archive, we need to include src/c++26/text_encoding.cc in the main library, but not export its symbols yet. This means they can be used by the two new functions which are exported from the main library. The encoding conversions are done for C++20, treating it as a DR that resolves LWG 3656. With this change we can increase the value of the __cpp_lib_format macro for C++23. The value should be 202207 for P2419R2, but we already implement P2510R3 (Formatting pointers) so can use the value 202304. libstdc++-v3/ChangeLog: PR libstdc++/109162 * acinclude.m4 (libtool_VERSION): Update to 6:34:0. * config/abi/pre/gnu.ver: Disambiguate old patters. Add new GLIBCXX_3.4.34 symbol version and new exports. * configure: Regenerate. * include/bits/chrono_io.h (_ChronoSpec::_M_locale_specific): Add new accessor functions to use a reserved bit in _Spec. (__formatter_chrono::_M_parse): Use _M_locale_specific(true) when chrono-specs contains locale-dependent conversion specifiers. (__formatter_chrono::_M_format): Open iconv descriptor if conversion to UTF-8 will be needed. (__formatter_chrono::_M_write): New function to write a localized string with possible character conversion. (__formatter_chrono::_M_a_A, __formatter_chrono::_M_b_B) (__formatter_chrono::_M_p, __formatter_chrono::_M_r) (__formatter_chrono::_M_x, __formatter_chrono::_M_X) (__formatter_chrono::_M_locale_fmt): Use _M_write. * include/bits/version.def (format): Update value. * include/bits/version.h: Regenerate. * include/std/format (_GLIBCXX_P2518R3): Check feature test macro instead of __cplusplus. (basic_format_context): Declare __formatter_chrono as friend. * src/c++20/Makefile.am: Add new file. * src/c++20/Makefile.in: Regenerate. * src/c++20/format.cc: New file. * testsuite/std/time/format_localized.cc: New test. * testsuite/util/testsuite_abi.cc: Add new symbol version. --- libstdc++-v3/acinclude.m4 | 2 +- libstdc++-v3/config/abi/pre/gnu.ver | 18 +- libstdc++-v3/configure | 2 +- libstdc++-v3/include/bits/chrono_io.h | 95 +++++++- libstdc++-v3/include/bits/version.def | 29 ++- libstdc++-v3/include/bits/version.h | 4 +- libstdc++-v3/include/std/format | 16 +- libstdc++-v3/src/c++20/Makefile.am | 8 +- libstdc++-v3/src/c++20/Makefile.in | 10 +- libstdc++-v3/src/c++20/format.cc | 213 ++++++++++++++++++ .../testsuite/std/time/format_localized.cc | 90 ++++++++ libstdc++-v3/testsuite/util/testsuite_abi.cc | 1 + 12 files changed, 459 insertions(+), 29 deletions(-) create mode 100644 libstdc++-v3/src/c++20/format.cc create mode 100644 libstdc++-v3/testsuite/std/time/format_localized.cc diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4 index e04aae25360..e4ed583b3ae 100644 --- a/libstdc++-v3/acinclude.m4 +++ b/libstdc++-v3/acinclude.m4 @@ -4230,7 +4230,7 @@ changequote([,])dnl fi # For libtool versioning info, format is CURRENT:REVISION:AGE -libtool_VERSION=6:33:0 +libtool_VERSION=6:34:0 # Everything parsed; figure out what files and settings to use. case $enable_symvers in diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver index 31449b5b87b..ae79b371d80 100644 --- a/libstdc++-v3/config/abi/pre/gnu.ver +++ b/libstdc++-v3/config/abi/pre/gnu.ver @@ -109,7 +109,11 @@ GLIBCXX_3.4 { std::[j-k]*; # std::length_error::l*; # std::length_error::~l*; - std::locale::[A-Za-e]*; + # std::locale::[A-Za-d]*; + std::locale::all; + std::locale::classic*; + std::locale::collate; + std::locale::ctype; std::locale::facet::[A-Za-z]*; std::locale::facet::_S_get_c_locale*; std::locale::facet::_S_clone_c_locale*; @@ -168,7 +172,7 @@ GLIBCXX_3.4 { std::strstream*; std::strstreambuf*; # std::t[a-q]*; - std::t[a-g]*; + std::terminate*; std::th[a-h]*; std::th[j-q]*; std::th[s-z]*; @@ -2528,6 +2532,16 @@ GLIBCXX_3.4.33 { _ZNKSt12__basic_fileIcE13native_handleEv; } GLIBCXX_3.4.32; +# GCC 15.1.0 +GLIBCXX_3.4.34 { + # std::__format::__with_encoding_conversion + _ZNSt8__format26__with_encoding_conversionERKSt6locale; + # std::__format::__locale_encoding_to_utf8 + _ZNSt8__format25__locale_encoding_to_utf8ERKSt6localeSt17basic_string_viewIcSt11char_traitsIcEEPv; + # __sso_string constructor and destructor + _ZNSt12__sso_string[CD][12]Ev; +} GLIBCXX_3.4.33; + # Symbols in the support library (libsupc++) have their own tag. CXXABI_1.3 { diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure index 5645e991af7..fe525308ae2 100755 --- a/libstdc++-v3/configure +++ b/libstdc++-v3/configure @@ -51040,7 +51040,7 @@ $as_echo "$as_me: WARNING: === Symbol versioning will be disabled." >&2;} fi # For libtool versioning info, format is CURRENT:REVISION:AGE -libtool_VERSION=6:33:0 +libtool_VERSION=6:34:0 # Everything parsed; figure out what files and settings to use. case $enable_symvers in diff --git a/libstdc++-v3/include/bits/chrono_io.h b/libstdc++-v3/include/bits/chrono_io.h index d8a4a121113..a449ffdc558 100644 --- a/libstdc++-v3/include/bits/chrono_io.h +++ b/libstdc++-v3/include/bits/chrono_io.h @@ -38,8 +38,10 @@ #include <iomanip> // setw, setfill #include <format> #include <charconv> // from_chars +#include <stdexcept> // __sso_string #include <bits/streambuf_iterator.h> +#include <bits/unique_ptr.h> namespace std _GLIBCXX_VISIBILITY(default) { @@ -216,6 +218,20 @@ namespace __format struct _ChronoSpec : _Spec<_CharT> { basic_string_view<_CharT> _M_chrono_specs; + + // Use one of the reserved bits in __format::_Spec<C>. + // This indicates that a locale-dependent conversion specifier such as + // %a is used in the chrono-specs. This is not the same as the + // _Spec<C>::_M_localized member which indicates that "L" was present + // in the format-spec, e.g. "{:L%a}" is localized and locale-specific, + // but "{:L}" is only localized and "{:%a}" is only locale-specific. + constexpr bool + _M_locale_specific() const noexcept + { return this->_M_reserved; } + + constexpr void + _M_locale_specific(bool __b) noexcept + { this->_M_reserved = __b; } }; // Represents the information provided by a chrono type. @@ -310,11 +326,12 @@ namespace __format const auto __chrono_specs = __first++; // Skip leading '%' if (*__chrono_specs != '%') __throw_format_error("chrono format error: no '%' at start of " - "chrono-specs"); + "chrono-specs"); _CharT __mod{}; bool __conv = true; int __needed = 0; + bool __locale_specific = false; while (__first != __last) { @@ -327,15 +344,18 @@ namespace __format case 'a': case 'A': __needed = _Weekday; + __locale_specific = true; break; case 'b': case 'h': case 'B': __needed = _Month; + __locale_specific = true; break; case 'c': __needed = _DateTime; __allowed_mods = _Mod_E; + __locale_specific = true; break; case 'C': __needed = _Year; @@ -373,6 +393,8 @@ namespace __format break; case 'p': case 'r': + __locale_specific = true; + [[fallthrough]]; case 'R': case 'T': __needed = _TimeOfDay; @@ -398,10 +420,12 @@ namespace __format break; case 'x': __needed = _Date; + __locale_specific = true; __allowed_mods = _Mod_E; break; case 'X': __needed = _TimeOfDay; + __locale_specific = true; __allowed_mods = _Mod_E; break; case 'y': @@ -441,6 +465,8 @@ namespace __format || (__mod == 'O' && !(__allowed_mods & _Mod_O))) __throw_format_error("chrono format error: invalid " " modifier in chrono-specs"); + if (__mod && __c != 'z') + __locale_specific = true; __mod = _CharT(); if ((__parts & __needed) != __needed) @@ -472,6 +498,7 @@ namespace __format _M_spec = __spec; _M_spec._M_chrono_specs = __string_view(__chrono_specs, __first - __chrono_specs); + _M_spec._M_locale_specific(__locale_specific); return __first; } @@ -491,6 +518,24 @@ namespace __format if (__first == __last) return _M_format_to_ostream(__t, __fc, __is_neg); +#if defined _GLIBCXX_USE_NL_LANGINFO_L && __CHAR_BIT__ == 8 + // _GLIBCXX_RESOLVE_LIB_DEFECTS + // 3565. Handling of encodings in localized formatting + // of chrono types is underspecified + if constexpr (is_same_v<_CharT, char>) + if constexpr (__unicode::__literal_encoding_is_utf8()) + if (_M_spec._M_localized && _M_spec._M_locale_specific()) + { + extern locale __with_encoding_conversion(const locale&); + + // Allocate and cache the necessary state to convert strings + // in the locale's encoding to UTF-8. + locale __loc = __fc.locale(); + if (__loc != locale::classic()) + __fc._M_loc = __with_encoding_conversion(__loc); + } +#endif + _Sink_iter<_CharT> __out; __format::_Str_sink<_CharT> __sink; bool __write_direct = false; @@ -768,6 +813,29 @@ namespace __format static constexpr _CharT _S_space = _S_chars[14]; static constexpr const _CharT* _S_empty_spec = _S_chars + 15; + template<typename _OutIter> + _OutIter + _M_write(_OutIter __out, const locale& __loc, __string_view __s) const + { +#if defined _GLIBCXX_USE_NL_LANGINFO_L && __CHAR_BIT__ == 8 + __sso_string __buf; + // _GLIBCXX_RESOLVE_LIB_DEFECTS + // 3565. Handling of encodings in localized formatting + // of chrono types is underspecified + if constexpr (is_same_v<_CharT, char>) + if constexpr (__unicode::__literal_encoding_is_utf8()) + if (_M_spec._M_localized && _M_spec._M_locale_specific() + && __loc != locale::classic()) + { + extern string_view + __locale_encoding_to_utf8(const locale&, string_view, void*); + + __s = __locale_encoding_to_utf8(__loc, __s, &__buf); + } +#endif + return __format::__write(std::move(__out), __s); + } + template<typename _Tp, typename _FormatContext> typename _FormatContext::iterator _M_a_A(const _Tp& __t, typename _FormatContext::iterator __out, @@ -787,7 +855,7 @@ namespace __format else __tp._M_days_abbreviated(__days); __string_view __str(__days[__wd.c_encoding()]); - return __format::__write(std::move(__out), __str); + return _M_write(std::move(__out), __loc, __str); } template<typename _Tp, typename _FormatContext> @@ -808,7 +876,7 @@ namespace __format else __tp._M_months_abbreviated(__months); __string_view __str(__months[(unsigned)__m - 1]); - return __format::__write(std::move(__out), __str); + return _M_write(std::move(__out), __loc, __str); } template<typename _Tp, typename _FormatContext> @@ -1085,8 +1153,8 @@ namespace __format const auto& __tp = use_facet<__timepunct<_CharT>>(__loc); const _CharT* __ampm[2]; __tp._M_am_pm(__ampm); - return std::format_to(std::move(__out), _S_empty_spec, - __ampm[__hms.hours().count() >= 12]); + return _M_write(std::move(__out), __loc, + __ampm[__hms.hours().count() >= 12]); } template<typename _Tp, typename _FormatContext> @@ -1121,8 +1189,9 @@ namespace __format basic_string<_CharT> __fmt(_S_empty_spec); __fmt.insert(1u, 1u, _S_colon); __fmt.insert(2u, __ampm_fmt); - return std::vformat_to(std::move(__out), __fmt, - std::make_format_args<_FormatContext>(__t)); + using _FmtStr = _Runtime_format_string<_CharT>; + return _M_write(std::move(__out), __loc, + std::format(__loc, _FmtStr(__fmt), __t)); } template<typename _Tp, typename _FormatContext> @@ -1305,8 +1374,9 @@ namespace __format basic_string<_CharT> __fmt(_S_empty_spec); __fmt.insert(1u, 1u, _S_colon); __fmt.insert(2u, __rep); - return std::vformat_to(std::move(__out), __fmt, - std::make_format_args<_FormatContext>(__t)); + using _FmtStr = _Runtime_format_string<_CharT>; + return _M_write(std::move(__out), __loc, + std::format(__loc, _FmtStr(__fmt), __t)); } template<typename _Tp, typename _FormatContext> @@ -1328,8 +1398,9 @@ namespace __format basic_string<_CharT> __fmt(_S_empty_spec); __fmt.insert(1u, 1u, _S_colon); __fmt.insert(2u, __rep); - return std::vformat_to(std::move(__out), __fmt, - std::make_format_args<_FormatContext>(__t)); + using _FmtStr = _Runtime_format_string<_CharT>; + return _M_write(std::move(__out), __loc, + std::format(__loc, _FmtStr(__fmt), __t)); } template<typename _Tp, typename _FormatContext> @@ -1606,7 +1677,7 @@ namespace __format const auto& __tp = use_facet<time_put<_CharT>>(__loc); __tp.put(__os, __os, _S_space, &__tm, __fmt, __mod); if (__os) - __out = __format::__write(std::move(__out), __os.view()); + __out = _M_write(std::move(__out), __loc, __os.view()); return __out; } }; diff --git a/libstdc++-v3/include/bits/version.def b/libstdc++-v3/include/bits/version.def index 42cdef2f526..74947301760 100644 --- a/libstdc++-v3/include/bits/version.def +++ b/libstdc++-v3/include/bits/version.def @@ -1161,16 +1161,22 @@ ftms = { }; ftms = { + name = format; + // 202304 P2510R3 Formatting pointers + // 202305 P2757R3 Type checking format args + // 202306 P2637R3 Member visit + // 202311 P2918R2 Runtime format strings II + // values = { + // v = 202304; + // cxxmin = 26; + // hosted = yes; + // }; // 201907 Text Formatting, Integration of chrono, printf corner cases. // 202106 std::format improvements. // 202110 Fixing locale handling in chrono formatters, generator-like types. // 202207 Encodings in localized formatting of chrono, basic-format-string. - // 202207 P2286R8 Formatting Ranges - // 202207 P2585R1 Improving default container formatting - // TODO: #define __cpp_lib_format_ranges 202207L - name = format; values = { - v = 202110; + v = 202207; cxxmin = 20; hosted = yes; }; @@ -1374,6 +1380,19 @@ ftms = { }; }; +// ftms = { + // name = format_ranges; + // 202207 P2286R8 Formatting Ranges + // 202207 P2585R1 Improving default container formatting + // LWG3750 Too many papers bump __cpp_lib_format + // TODO: #define __cpp_lib_format_ranges 202207L + // values = { + // v = 202207; + // cxxmin = 23; + // hosted = yes; + // }; +// }; + ftms = { name = freestanding_algorithm; values = { diff --git a/libstdc++-v3/include/bits/version.h b/libstdc++-v3/include/bits/version.h index 1eaf3733bc2..9f8673395da 100644 --- a/libstdc++-v3/include/bits/version.h +++ b/libstdc++-v3/include/bits/version.h @@ -1305,9 +1305,9 @@ #if !defined(__cpp_lib_format) # if (__cplusplus >= 202002L) && _GLIBCXX_HOSTED -# define __glibcxx_format 202110L +# define __glibcxx_format 202207L # if defined(__glibcxx_want_all) || defined(__glibcxx_want_format) -# define __cpp_lib_format 202110L +# define __cpp_lib_format 202207L # endif # endif #endif /* !defined(__cpp_lib_format) && defined(__glibcxx_want_format) */ diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format index 8f6a82a1fd4..fe00e547671 100644 --- a/libstdc++-v3/include/std/format +++ b/libstdc++-v3/include/std/format @@ -2342,10 +2342,10 @@ namespace __format // _GLIBCXX_RESOLVE_LIB_DEFECTS // P2510R3 Formatting pointers -#if __cplusplus > 202302L || ! defined __STRICT_ANSI__ -#define _GLIBCXX_P2518R3 1 +#if __glibcxx_format >= 202304L || ! defined __STRICT_ANSI__ +# define _GLIBCXX_P2518R3 1 #else -#define _GLIBCXX_P2518R3 0 +# define _GLIBCXX_P2518R3 0 #endif #if _GLIBCXX_P2518R3 @@ -3819,6 +3819,9 @@ namespace __format __do_vformat_to(_Out, basic_string_view<_CharT>, const basic_format_args<_Context>&, const locale* = nullptr); + + template<typename _CharT> struct __formatter_chrono; + } // namespace __format /// @endcond @@ -3829,6 +3832,11 @@ namespace __format * this class template explicitly. For typical uses of `std::format` the * library will use the specializations `std::format_context` (for `char`) * and `std::wformat_context` (for `wchar_t`). + * + * You are not allowed to define partial or explicit specializations of + * this class template. + * + * @since C++20 */ template<typename _Out, typename _CharT> class basic_format_context @@ -3861,6 +3869,8 @@ namespace __format const basic_format_args<_Context2>&, const locale*); + friend __format::__formatter_chrono<_CharT>; + public: ~basic_format_context() = default; diff --git a/libstdc++-v3/src/c++20/Makefile.am b/libstdc++-v3/src/c++20/Makefile.am index a24505e5141..d0f7859290c 100644 --- a/libstdc++-v3/src/c++20/Makefile.am +++ b/libstdc++-v3/src/c++20/Makefile.am @@ -36,7 +36,7 @@ else inst_sources = endif -sources = tzdb.cc +sources = tzdb.cc format.cc vpath % $(top_srcdir)/src/c++20 @@ -53,6 +53,12 @@ tzdb.o: tzdb.cc tzdata.zi.h $(CXXCOMPILE) -I. -c $< endif +# This needs access to std::text_encoding and to the internals of std::locale. +format.lo: format.cc + $(LTCXXCOMPILE) -std=gnu++26 -fno-access-control -c $< +format.o: format.cc + $(CXXCOMPILE) -std=gnu++26 -fno-access-control -c $< + if GLIBCXX_HOSTED libc__20convenience_la_SOURCES = $(sources) $(inst_sources) else diff --git a/libstdc++-v3/src/c++20/Makefile.in b/libstdc++-v3/src/c++20/Makefile.in index 3ec8c5ce804..d759b8dcc7c 100644 --- a/libstdc++-v3/src/c++20/Makefile.in +++ b/libstdc++-v3/src/c++20/Makefile.in @@ -121,7 +121,7 @@ CONFIG_CLEAN_FILES = CONFIG_CLEAN_VPATH_FILES = LTLIBRARIES = $(noinst_LTLIBRARIES) libc__20convenience_la_LIBADD = -am__objects_1 = tzdb.lo +am__objects_1 = tzdb.lo format.lo @ENABLE_EXTERN_TEMPLATE_TRUE@am__objects_2 = sstream-inst.lo @GLIBCXX_HOSTED_TRUE@am_libc__20convenience_la_OBJECTS = \ @GLIBCXX_HOSTED_TRUE@ $(am__objects_1) $(am__objects_2) @@ -432,7 +432,7 @@ headers = @ENABLE_EXTERN_TEMPLATE_TRUE@inst_sources = \ @ENABLE_EXTERN_TEMPLATE_TRUE@ sstream-inst.cc -sources = tzdb.cc +sources = tzdb.cc format.cc @GLIBCXX_HOSTED_FALSE@libc__20convenience_la_SOURCES = @GLIBCXX_HOSTED_TRUE@libc__20convenience_la_SOURCES = $(sources) $(inst_sources) @@ -755,6 +755,12 @@ vpath % $(top_srcdir)/src/c++20 @USE_STATIC_TZDATA_TRUE@tzdb.o: tzdb.cc tzdata.zi.h @USE_STATIC_TZDATA_TRUE@ $(CXXCOMPILE) -I. -c $< +# This needs access to std::text_encoding and to the internals of std::locale. +format.lo: format.cc + $(LTCXXCOMPILE) -std=gnu++26 -fno-access-control -c $< +format.o: format.cc + $(CXXCOMPILE) -std=gnu++26 -fno-access-control -c $< + # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: diff --git a/libstdc++-v3/src/c++20/format.cc b/libstdc++-v3/src/c++20/format.cc new file mode 100644 index 00000000000..bcf1dd156a7 --- /dev/null +++ b/libstdc++-v3/src/c++20/format.cc @@ -0,0 +1,213 @@ +// Definitions for <chrono> formatting -*- C++ -*- + +// Copyright The GNU Toolchain Authors. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// <http://www.gnu.org/licenses/>. + +#define _GLIBCXX_USE_CXX11_ABI 1 +#include "../c++26/text_encoding.cc" + +#ifdef _GLIBCXX_USE_NL_LANGINFO_L +# include <format> +# include <chrono> +# include <memory> // make_unique +# include <mutex> // mutex, lock_guard +# include <string.h> // strlen, strcpy +# ifdef _GLIBCXX_HAVE_ICONV +# include <iconv.h> +# include <errno.h> +# endif +#endif + +namespace std +{ +_GLIBCXX_BEGIN_NAMESPACE_VERSION +namespace __format +{ +// Helpers for P2419R2 +// (Clarify handling of encodings in localized formatting of chrono types) +// Convert a string from the locale's charset to UTF-8. + +#if defined _GLIBCXX_USE_NL_LANGINFO_L && __CHAR_BIT__ == 8 +namespace +{ +// A non-standard locale::facet that caches the locale's std::text_encoding +// and an iconv descriptor for converting from that encoding to UTF-8. +struct __encoding : locale::facet +{ + static locale::id id; + + explicit + __encoding(const text_encoding& enc, size_t refs = 0) + : facet(refs), _M_enc(enc) + { +#if defined _GLIBCXX_HAVE_ICONV + using enum text_encoding::id; + switch (_M_enc.mib()) + { + case UTF8: + case ASCII: + break; + default: + _M_cd = ::iconv_open("UTF-8", _M_enc.name()); + } +#endif + } + + ~__encoding() + { +#if defined _GLIBCXX_HAVE_ICONV + if (_M_cd != (::iconv_t)-1) + ::iconv_close(_M_cd); +#endif + } + + text_encoding _M_enc; +#if defined _GLIBCXX_HAVE_ICONV + ::iconv_t _M_cd = (::iconv_t)-1; + mutable mutex mx; +#endif + + // Convert `input` to UTF-8, using `out` to hold the result. + codecvt_base::result + conv(string_view input, string& out) const + { + if (input.empty()) [[unlikely]] + return codecvt_base::noconv; + +#if defined _GLIBCXX_HAVE_ICONV + if (_M_cd == (::iconv_t)-1) + return codecvt_base::error; + + size_t inbytesleft = input.size(); + size_t written = 0; + bool done = false; + + auto overwrite = [&](char* p, size_t n) { + auto inbytes + = const_cast<char*>(input.data()) + input.size() - inbytesleft; + char* outbytes = p + written; + size_t outbytesleft = n - written; + size_t res = ::iconv(_M_cd, &inbytes, &inbytesleft, + &outbytes, &outbytesleft); + if (res == (size_t)-1) + { + if (errno != E2BIG) + { + ::iconv(_M_cd, nullptr, 0, nullptr, 0); // reset + done = true; + return 0zu; + } + } + else + done = true; + written = outbytes - p; + return written; + }; + + size_t mult = 1; + lock_guard<mutex> lock(mx); + do + { + // Estimate that we need 1.5 UTF-8 code units per char, but increase + // that every time the conversion fails due to insufficient space. + out.resize_and_overwrite((inbytesleft * 3 / 2) * mult, overwrite); + ++mult; + } + while (!done); + + return out.empty() ? codecvt_base::error : codecvt_base::ok; +#else + return codecvt_base::error; +#endif + } +}; + +locale::id __encoding::id; + +inline const __encoding* +__get_encoding_facet(const locale& loc) +{ + // Don't need to use __try_use_facet with its dynamic_cast<const __encoding*> + // because we know there are no types derived from __encoding. We have the + // facet if the id is within the array bounds and the element is non-null. + const auto id = __encoding::id._M_id(); + if (id >= loc._M_impl->_M_facets_size) + return nullptr; + return static_cast<const __encoding*>(loc._M_impl->_M_facets[id]); +} + +} // namespace + +locale +__with_encoding_conversion(const locale& loc) +{ + if (__get_encoding_facet(loc)) + return loc; + + string name = loc.name(); + if (name == "C" || name == "*") + return loc; + + text_encoding locenc = __locale_encoding(name.c_str()); + + if (locenc == text_encoding::UTF8 || locenc == text_encoding::ASCII + || locenc == text_encoding::unknown) + return loc; + + auto facetp = std::make_unique<__encoding>(locenc); + locale loc2(loc, facetp.get()); // FIXME: PR libstdc++/113704 + facetp.release(); + // FIXME: Ideally we wouldn't need to reallocate this string again, + // just don't delete[] it in the locale(locale, Facet*) constructor. + if (const char* name = loc._M_impl->_M_names[0]) + { + loc2._M_impl->_M_names[0] = new char[strlen(name) + 1]; + strcpy(loc2._M_impl->_M_names[0], name); + } + return loc2; +} + +string_view +__locale_encoding_to_utf8(const locale& loc, string_view str, void* poutbuf) +{ + string& outbuf = *static_cast<string*>(poutbuf); + if (auto enc_facet = __get_encoding_facet(loc)) + { + auto result = enc_facet->conv(str, outbuf); + if (result == codecvt_base::ok) + str = outbuf; // UTF-8 output was written to outbuf. + // else result was noconv or error, return str unchanged. + } + return str; +} +#else +locale +__with_encoding_conversion(const locale& loc) +{ return loc; } + +string_view +__locale_encoding_to_utf8(const locale&, string_view str, void*) +{ return str; } +#endif // USE_NL_LANGINFO_L && CHAR_BIT == 8 +} // namespace __format +_GLIBCXX_END_NAMESPACE_VERSION +} // namespace std diff --git a/libstdc++-v3/testsuite/std/time/format_localized.cc b/libstdc++-v3/testsuite/std/time/format_localized.cc new file mode 100644 index 00000000000..64a1582b945 --- /dev/null +++ b/libstdc++-v3/testsuite/std/time/format_localized.cc @@ -0,0 +1,90 @@ +// { dg-do run { target c++20 } } +// { dg-require-namedlocale "es_ES.ISO8859-1" } +// { dg-require-namedlocale "fr_FR.ISO8859-1" } +// { dg-require-namedlocale "en_US.ISO8859-1" } +// { dg-require-namedlocale "en_US.ISO8859-15" } +// { dg-require-namedlocale "en_US.UTF-8" } + +// P2419R2 +// Clarify handling of encodings in localized formatting of chrono types + +// Localized date-time strings such as "février" should be converted to UTF-8 +// if the locale uses a different encoding. + +#include <chrono> +#include <format> +#include <stdio.h> +#include <testsuite_hooks.h> + +void +test_ru() +{ + bool ok = false; +#if __cpp_exceptions + std::locale loc; + try + { + loc = std::locale("ru_UA.KOI8-U"); + ok = true; + } + catch (const std::runtime_error&) + { + try + { + loc = std::locale("ru_RU.KOI8-R"); + ok = true; + } + catch (const std::runtime_error&) + { + } + } +#endif + if (ok) + { + auto s = std::format(loc, "День недели: {:L}", std::chrono::Monday); + VERIFY( s == "День недели: Пн" || s == "День недели: пн" ); + } + else + puts("NOTE: test_ru(): skipped unsupported locales"); +} + +void +test_es() +{ + std::locale loc(ISO_8859(1,es_ES)); + auto s = std::format(loc, "Día de la semana: {:L%A %a}", + std::chrono::Wednesday); + if (s.back() == '.') // FreeBSD has this in the %a string + s.pop_back(); + VERIFY( s == "Día de la semana: miércoles mié" ); +} + +void +test_fr() +{ + std::locale loc(ISO_8859(1,fr_FR)); + auto s = std::format(loc, "Six mois après {0:L%b}, c'est {1:L%B}.", + std::chrono::February, std::chrono::August); + VERIFY( s == "Six mois après févr., c'est août." ); +} + +void +test_en() +{ + using namespace std::chrono; + + for (auto l : {ISO_8859(1,en_US), ISO_8859(15,en_US), "en_US.UTF-8", "C"}) + { + std::locale loc(ISO_8859(1,en_US)); + auto s = std::format(loc, "{:L%b %B %a %A}", sys_days(2024y/July/30)); + VERIFY( s == "Jul July Tue Tuesday" ); + } +} + +int main() +{ + test_ru(); + test_es(); + test_fr(); + test_en(); +} diff --git a/libstdc++-v3/testsuite/util/testsuite_abi.cc b/libstdc++-v3/testsuite/util/testsuite_abi.cc index ec7c3df9ecc..ce9cda660fa 100644 --- a/libstdc++-v3/testsuite/util/testsuite_abi.cc +++ b/libstdc++-v3/testsuite/util/testsuite_abi.cc @@ -215,6 +215,7 @@ check_version(symbol& test, bool added) known_versions.push_back("GLIBCXX_3.4.31"); known_versions.push_back("GLIBCXX_3.4.32"); known_versions.push_back("GLIBCXX_3.4.33"); + known_versions.push_back("GLIBCXX_3.4.34"); known_versions.push_back("GLIBCXX_LDBL_3.4.31"); known_versions.push_back("GLIBCXX_IEEE128_3.4.29"); known_versions.push_back("GLIBCXX_IEEE128_3.4.30"); -- 2.45.2