Joseph Myers <j...@polyomino.org.uk>: > Concretely, what I'd suggest is: convert ISO-8859-1 entries in the > checked-in list to UTF-8, removing anything that thereby becomes a > duplicate or unnecessary; handle anything whose encoding isn't simply > ISO-8859-1 or UTF-8 via a hardcoded entry in bugdb.py using hex escapes > like the existing such entries there. Once the checked-in list is pure > UTF-8 it's easier for people to review and edit. Where the issue is only > presence of ISO-8859 NBSP, or "" or () around the names, remove that in > the checked-in list and again remove duplicates. That way the list can be > limited to non-encoding variations.
Be aware that repusurgeon has a "transcode" command for moving a specified set of object to UTF-8 from a specified encoding. -- <a href="http://www.catb.org/~esr/">Eric S. Raymond</a>