On Thu, 26 Dec 2019, Jakub Jelinek wrote:

> Is there some easy way (e.g. file in the conversion scripts) to correct
> spelling and other mistakes in the commit authors?

I've added author fixups to bugdb.py, so you can add any number of fixes 
(e.g. based on authors that look suspicious in "git shortlog -s -e --all" 
output) to the author_fixups array (and send a merge-request for the 
gcc-conversion project, or a patch).

The case of multiple consecutive spaces in an attribution is now 
normalized to a single space in reposurgeon, so no fixes are needed for 
that (and fixups should be given in the form with a single space).  In 
addition to that array of fixes, bugdb.py does the following so they don't 
need listing in the array of fixups: converts ISO-8859-1 NBSP to space 
(and trims such spaces at left or right or where the result is multiple 
consecutive spaces); converts ISO-8859-1 author names (coming from 
ChangeLog files) to UTF-8 (there are manual fixups for cases where the 
author in the ChangeLog file didn't seem to be ISO-8859-1 but wasn't valid 
UTF-8 either); fixes up the cases you found where certain forms of 
timestamp from the ChangeLog header, or header specifying multiple 
authors, were used but handled badly in conversion to authors.  I've found 
and reported another case where a form of ChangeLog header used in the 
past isn't handled at all, and Eric is looking at it.

-- 
Joseph S. Myers
j...@polyomino.org.uk

Reply via email to