On Sat, 6 Jan 2024 at 17:03, Jonathan Wakely <jwak...@redhat.com> wrote: > > On Sat, 6 Jan 2024 at 16:57, Lewis Hyatt <lhy...@gmail.com> wrote: > > > > On Sat, Jan 6, 2024 at 11:40 AM Jonathan Wakely <jwak...@redhat.com> wrote: > > > > > > Here's a V2 patch which addresses the two things I mentioned: the new > > > Python script now generates a complete file that can just be included by > > > <bits/unicode.h>, and the full Unicode 15.1.0 grapheme cluster break > > > rules are supported (I think ... more testing needed for some of the > > > complex rules). > > > > > > -- >8 -- > > > > Thanks, by the way, for fixing the typo in gen_wcwidth.py. > > One thing I wanted to point out, the file contrib/unicode/README > > contains a list of steps to follow in order to update to a new Unicode > > version. There are 10 or so steps to generate everything libcpp and > > diagnostics care about. Do you think it's worth adding something for > > the new libstdc++ parts there too? > > Ah, thanks for pointing that out. Yes, I should add to that.
Here's what I suggest adding to the README: --- a/contrib/unicode/README +++ b/contrib/unicode/README @@ -16,7 +16,12 @@ ftp://ftp.unicode.org/Public/UNIDATA/DerivedNormalizationProps.txt ftp://ftp.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt ftp://ftp.unicode.org/Public/UNIDATA/NameAliases.txt -These files have been added to source control in this directory; +Two additional files are needed for lookup tables in libstdc++: + +ftp://ftp.unicode.org/Public/UNIDATA/auxiliary/GraphemeBreakProperty.txt +ftp://ftp.unicode.org/Public/UNIDATA/emoji/emoji-data.txt + +All these files have been added to source control in this directory; please see unicode-license.txt for the relevant copyright information. In order to keep in sync with glibc's wcwidth as much as possible, it is @@ -24,7 +29,7 @@ desirable for the logic that processes the Unicode data to be the same as glibc's. To that end, we also put in this directory, in the from_glibc/ directory, the glibc python code that implements their logic. This code was copied verbatim from glibc, and it can be updated at any time from the glibc -source code repository. The files copied from that respository are: +source code repository. The files copied from that repository are: localedata/unicode-gen/unicode_utils.py localedata/unicode-gen/utf8_gen.py @@ -71,3 +76,6 @@ The procedure to update GCC's Unicode support is the following: 9: Generate uname2c.h as follows: ../../libcpp/makeuname2c UnicodeData.txt NameAliases.txt \ > ../../libcpp/uname2c.h + +See gen_libstdcxx_unicode_data.py for instructions on updating the lookup +tables in libstdc++. That refers to gen_libstdcxx_unicode_data.py which I think is a better name than gen_std_format_width.py so I've renamed the new script in my local tree.