On Thu, 26 Dec 2019, Jakub Jelinek wrote:

> Is there some easy way (e.g. file in the conversion scripts) to correct
> spelling and other mistakes in the commit authors?

These can be corrected via reposurgeon commands in gcc.lift (see the 
existing "/<jwakely-...@gmail.com>/ attribution =A set 
jwakely....@gmail.com" command), or the msgout/msgin mechanism used in 
Richard's script for commit message improvements could also make changes 
to authors (don't know the exact syntax offhand, but I believe authors are 
among the things that mechanism allows to be changed in commit metadata, 
so the script could gain a table of author corrections to apply).

> Or I see in git shortlog parts of date being parsed as name, e.g.
> (basically anything in git shortlog after the "..." wrapped names and before
> Aaron Conole (2): in alphabetical sorting, or after Zuxy Meng (4):.
> 00:27 -0700  Zack Weinberg (1):

> lsd.ic.unicamp.br),  Jakub Jelinek (1):

Filed https://gitlab.com/esr/reposurgeon/issues/218 for these kinds of 
ChangeLog entries - some changes to regular expressions should be able to 
make the code handle them better (possibly by reverting to committer 
identities in some more cases where the ChangeLog header line looks odd in 
some way).

> <A0>Eric Botcazou (1):

I didn't include anything for this in my reduced test.  I'd noted some of 
the invalid attribution warnings from reposurgeon also involving bytes 
0xA0 (= ISO-8859-1 NBSP).  If anything is appropriate there, it might be 
something like "change any 0xA0 that's preceded by an ASCII byte to ASCII 
space before processing further" ("preceded by an ASCII byte" being needed 
to avoid the case of 0xA0 in the middle of a UTF-8 character).

-- 
Joseph S. Myers
j...@polyomino.org.uk

Reply via email to