On Sat, 28 Dec 2019, Joseph Myers wrote: > On Sat, 28 Dec 2019, Richard Earnshaw (lists) wrote: > > > I've added the list of emails that I posted yesterday to the conversion > > scripts. I've not written anything to reprocess that yet. I want to > > leave that until we've completed the general review of the preferred > > changes we want. Auto-generating that data from the list will probably > > be easier than maintaining it inside bugdb.py for now. > > I've now pushed a change to automate removing "" or () around names. > Together with the automatic conversion of ISO-8859-1 names to UTF-8 that > should slightly reduce the number of cases needing handling from that > list.
Concretely, what I'd suggest is: convert ISO-8859-1 entries in the checked-in list to UTF-8, removing anything that thereby becomes a duplicate or unnecessary; handle anything whose encoding isn't simply ISO-8859-1 or UTF-8 via a hardcoded entry in bugdb.py using hex escapes like the existing such entries there. Once the checked-in list is pure UTF-8 it's easier for people to review and edit. Where the issue is only presence of ISO-8859 NBSP, or "" or () around the names, remove that in the checked-in list and again remove duplicates. That way the list can be limited to non-encoding variations. -- Joseph S. Myers j...@polyomino.org.uk