On Sat, 6 Jan 2024 at 17:03, Jonathan Wakely <jwak...@redhat.com> wrote:
>
> On Sat, 6 Jan 2024 at 16:57, Lewis Hyatt <lhy...@gmail.com> wrote:
> >
> > On Sat, Jan 6, 2024 at 11:40 AM Jonathan Wakely <jwak...@redhat.com> wrote:
> > >
> > > Here's a V2 patch which addresses the two things I mentioned: the new
> > > Python script now generates a complete file that can just be included by
> > > <bits/unicode.h>, and the full Unicode 15.1.0 grapheme cluster break
> > > rules are supported (I think ... more testing needed for some of the
> > > complex rules).
> > >
> > > -- >8 --
> >
> > Thanks, by the way, for fixing the typo in gen_wcwidth.py.
> > One thing I wanted to point out, the file contrib/unicode/README
> > contains a list of steps to follow in order to update to a new Unicode
> > version. There are 10 or so steps to generate everything libcpp and
> > diagnostics care about. Do you think it's worth adding something for
> > the new libstdc++ parts there too?
>
> Ah, thanks for pointing that out. Yes, I should add to that.

Here's what I suggest adding to the README:

--- a/contrib/unicode/README
+++ b/contrib/unicode/README
@@ -16,7 +16,12 @@
ftp://ftp.unicode.org/Public/UNIDATA/DerivedNormalizationProps.txt
ftp://ftp.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt
ftp://ftp.unicode.org/Public/UNIDATA/NameAliases.txt

-These files have been added to source control in this directory;
+Two additional files are needed for lookup tables in libstdc++:
+
+ftp://ftp.unicode.org/Public/UNIDATA/auxiliary/GraphemeBreakProperty.txt
+ftp://ftp.unicode.org/Public/UNIDATA/emoji/emoji-data.txt
+
+All these files have been added to source control in this directory;
please see unicode-license.txt for the relevant copyright information.

In order to keep in sync with glibc's wcwidth as much as possible, it is
@@ -24,7 +29,7 @@ desirable for the logic that processes the Unicode
data to be the same as
glibc's.  To that end, we also put in this directory, in the from_glibc/
directory, the glibc python code that implements their logic.  This code was
copied verbatim from glibc, and it can be updated at any time from the glibc
-source code repository.  The files copied from that respository are:
+source code repository.  The files copied from that repository are:

localedata/unicode-gen/unicode_utils.py
localedata/unicode-gen/utf8_gen.py
@@ -71,3 +76,6 @@ The procedure to update GCC's Unicode support is the
following:
9:  Generate uname2c.h as follows:
      ../../libcpp/makeuname2c UnicodeData.txt NameAliases.txt \
       > ../../libcpp/uname2c.h
+
+See gen_libstdcxx_unicode_data.py for instructions on updating the lookup
+tables in libstdc++.


That refers to gen_libstdcxx_unicode_data.py which I think is a better
name than gen_std_format_width.py so I've renamed the new script in my
local tree.

Reply via email to