On Thu, 26 Dec 2019, Jakub Jelinek wrote: > Is there some easy way (e.g. file in the conversion scripts) to correct > spelling and other mistakes in the commit authors?
I've added author fixups to bugdb.py, so you can add any number of fixes (e.g. based on authors that look suspicious in "git shortlog -s -e --all" output) to the author_fixups array (and send a merge-request for the gcc-conversion project, or a patch). The case of multiple consecutive spaces in an attribution is now normalized to a single space in reposurgeon, so no fixes are needed for that (and fixups should be given in the form with a single space). In addition to that array of fixes, bugdb.py does the following so they don't need listing in the array of fixups: converts ISO-8859-1 NBSP to space (and trims such spaces at left or right or where the result is multiple consecutive spaces); converts ISO-8859-1 author names (coming from ChangeLog files) to UTF-8 (there are manual fixups for cases where the author in the ChangeLog file didn't seem to be ISO-8859-1 but wasn't valid UTF-8 either); fixes up the cases you found where certain forms of timestamp from the ChangeLog header, or header specifying multiple authors, were used but handled badly in conversion to authors. I've found and reported another case where a form of ChangeLog header used in the past isn't handled at all, and Eric is looking at it. -- Joseph S. Myers j...@polyomino.org.uk