On Mon, Oct 10, 2016 at 6:38 AM, Jonathan Wakely <jwakely....@gmail.com> wrote: > On 7 October 2016 at 22:26, Joseph Myers wrote: >> On Fri, 7 Oct 2016, Frank Ch. Eigler wrote: >>> FWIW, I thought at one point the consensus was that the mailmap would >>> expand only to $use...@gcc.gnu.org rather than $userid@$organization, >>> esp. considering the case where there is no single $organization that >>> accurately covers the whole contribution timespan of the given $userid. >> >> I don't think there was any such consensus (older ids weren't from >> gcc.gnu.org anyway so @gcc.gnu.org would be nonsense for that part of the >> history). >> >> My view is: contributors are free to specify what name and email address >> they want used, but if they want something other than a single name and >> email address for the whole commit history with a given username, it's the >> contributor's responsibility to come up with lists of commits that use >> each mapping rather than a hypothetical recipe based on examining >> ChangeLogs. > > We'd only need to look at the actual ChangeLogs if the commit message > doesn't include a name and email address. And if we just use the > committer, how do we record the author of a change? > > As Richi said a year ago (and my reply was drafted a year ago but not sent) > ... > > On 17 September 2015 at 11:44, Richard Biener wrote: >> Maybe I'm missing sth but apart from the CVS imported revisions each >> SVN revision should contain the actual change plus the changes to the >> ChangeLog files (you can't count on the commit message itself I guess >> as not all people replicate the ChangeLog entries there). > > It's probably a good start though. If the commit message does have: > > YYYY-MM-DD John Doe <j...@example.com> > > then it's probably reliable. If the commit message doesn't have that > (when I'm committing my own work I don't include that line in the > commit message) then look for ChangeLog entries in the commit. > >> There may be cases we can't handle and then doing some commit ID >> mapping might be ok, but I expect 95% of the cases to work out nicely >> so we should preserve what is in the ChangeLog entry (note that we have >> very strict formatting requirement for the authors there). > > Particularly since the ChangeLog entry gives the Author, which is > often not the same as the Committer.
Yes, very often they will be different. This processing can, and probably should, be done with git filter-branch after the initial conversion. Jason