On Wed, 2 Apr 2025, Tomasz KamiĆski wrote: > This patch implements part P2372R3 that specified debug (escaped) > format for the stings and characters sequecenes. This include both > handling of the '?' formatt specifier and set_debug_format member. > > To indicate partial support we define __glibcxx_format_ranges macro > value 1, without defining __cpp_lib_format_ranges. > > We provide two separate escaping routines depending on the literal > encoding for the corresponding character types. If the charcter > encoding we follow the specification for the standard > (__format::__write_escaped_unicode). > For other encodings, we escape only characters in range [0x00, 0x80), > interpreting them as ACII values: [0x00, 0x20), 0x7f and '\t', '\r', > '\n', '\\', '"', '\'' are escaped. We assume every character outside > this range is printable (__format::_write_escpaed_ascii). > In particular we do not yet implement special handling of shift > sequences. > > For Unicode escaping a new __escape_edges table is introduced, > that encodes information if character belongs tp General_Category > that is escaped by the standard (Control or Other). This table > is generated from DerivedGeneralCategory.txt provided by Unicode. > Only boolean flag is preserved to reduce the number of entires. > The additional rules for escaping are handled by __should_escape_unicode. > > When width of precision is specified, we emit escaped string > to the temporary buffer and format the resulting string according > ot the format spec. For characters fixed size stack buffer, for > which a new _Fixedbuf_sink is introduced. > > Finally this patch corrects handling of UTF-32LE and UTF32-BE > in __unicode::__literal_encoding_is_unicode<_CharT>, and now they > are properly recognized as unicode. > > contrib/ChangeLog: > > * unicode/README: > Mentioned `DerivedGeneralCategory.txt` > * unicode/gen_libstdcxx_unicode_data.py: > Generation __escape_edges table from DerivedGeneralCategory.txt. > Update file name in comments. > * unicode/DerivedGeneralCategory.txt: > Copy of file distrubuted by Unicode Consortium > > ftp://ftp.unicode.org/Public/UNIDATA/extracted/DerivedGeneralCategory.txt. > > libstdc++-v3/ChangeLog: > > * include/bits/chrono_io.h (_GLIBCXX_WIDEN_, _GLIBCXX_WIDEN) > (__detail::_Widen): Moved to std/format file. > * include/bits/unicode-data.h: > Regnerate using contrib/unicode/gen_libstdcxx_unicode_data.py. > * include/bits/unicode.h (__unicode::_Utf_iterator::_M_units) > (__unicode::__should_escape_category): Define. > (__unicode::__literal_encoding_is_unicode<_CharT>): > Corrected handing for UTF-16 and UTF-32 with "LE" or "BE" suffix. > * include/bits/version.def: > Define __glibcxx_format_ranges without corresponding std name. > * include/bits/version.h: Regenerate. > * include/std/format (_GLIBCXX_WIDEN_, _GLIBCXX_WIDEN): > Moved from include/bits/chrono_io.h. > (__format::_Term_char, __format::_Escapes, __format::_Separators) > (__format::__should_escape_ascii, __format::__should_escape_unicode) > (__format::__write_escape_seq, __format::__write_escaped_char) > (__format::__write_escaped_acii, __format::__write_escaped_unicode) > (__format::__write_escaped): Define. > (__formatter_str::_M_format): Extracted non-escaped formatting. > (__formatter_str::format): Handle _Pres_esc. > (__formatter_int::_M_do_parse): Parse '?' if__glibcxx_format_ranges > if set. > (__formatter_int::_M_format_character_escaped): Define. > (formatter<_CharT, _CharT>::format, formatter<char, wchar_t>::format): > Handle _Pres_esc. > (__formatter_str::set_debug_format, formatter<...>::set_debug_format) > Guard with __glibcxx_format_ranges. > (__format::_Fixedbuf_sink): Define. > * testsuite/std/format/debug.cc: New test. > * testsuite/std/format/parse_ctx.cc (escaped_strings_supported): > Define to true if __glibcxx_format_ranges is defined. > * testsuite/std/format/string.cc (escaped_strings_supported): > Define to true if __glibcxx_format_ranges is defined. > --- > Testing on x86_64-linux. OK for trunk? > > For dg-options could I cofigure a run with unicode and non-unicode > encodings in same file? If so what would encoding that may be supported > on most of the platforms we run tests on (value for -fexec-charset=).
Not sure if you resolved this already but one way to generally run the same test with a different set of flags is to create a new test file that #includes the original one and sets different dg-options (and duplicates the other directives as appropriate), e.g. libstdc++-v3/testsuite/std/format/debug_nonunicode.cc: // { dg-options "-fexec-charset=... -fwide-exec-charset=..." } // { dg-do run { target c++23 } } // { dg-add-options no_pch } #include "debug.cc" Dejagnu directives are parsed before the preprocessor is run.