On Sat, 28 Dec 2019, Joseph Myers wrote:

> Concretely, what I'd suggest is: convert ISO-8859-1 entries in the 
> checked-in list to UTF-8, removing anything that thereby becomes a 
> duplicate or unnecessary; handle anything whose encoding isn't simply 
> ISO-8859-1 or UTF-8 via a hardcoded entry in bugdb.py using hex escapes 
> like the existing such entries there.  Once the checked-in list is pure 
> UTF-8 it's easier for people to review and edit.  Where the issue is only 
> presence of ISO-8859 NBSP, or "" or () around the names, remove that in 
> the checked-in list and again remove duplicates.  That way the list can be 
> limited to non-encoding variations.

I've now made those changes to the checked-in list so it's pure UTF-8, and 
thus easier to review and edit.  We still need to implement code in 
bugdb.py to use that list to pick the preferred form from each list of 
variants (and people may wish to change the preferred forms in some 
cases).

-- 
Joseph S. Myers
j...@polyomino.org.uk

Reply via email to