> On Dec 26, 2019, at 2:16 PM, Jakub Jelinek <ja...@redhat.com> wrote:
> 
> On Thu, Dec 26, 2019 at 11:04:29AM +0000, Joseph Myers wrote:
> Is there some easy way (e.g. file in the conversion scripts) to correct
> spelling and other mistakes in the commit authors?
> E.g. there are misspelled surnames, etc. (e.g. looking at my name, I see
> Jakub Jakub Jelinek (1):
> Jakub Jeilnek (1):
> Jelinek (1):
> entries next to the expected one with most of the commits.
> For the misspellings, wonder if e.g. we couldn't compute edit distances from
> other names and if we have one with many commits and then one with very few
> with small edit distance from those, flag it for human review.

This is close to what svn-git-author.sh script is doing in gcc-pretty and 
gcc-reparent conversions.  It ignores 1-3 character differences in 
author/committer names and email addresses.  I've audited results for all 
branches and didn't spot any mistakes.

In other news, I'm working on comparison of gcc-pretty, gcc-reparent and 
gcc-reposurgeon-5a repos among themselves.  Below are current notes for 
comparison of gcc-pretty/trunk and gcc-reposurgeon-5a/trunk.

== Merges on trunk ==

Reposurgeon creates merge entries on trunk when changes from a branch are 
merged into trunk.  This brings entire development history from the branch to 
trunk, which is both good and bad.  The good part is that we get more 
visibility into how the code evolved.  The bad part is that we get many "noisy" 
commits from merged branch (e.g., "Merge in trunk" every few revisions) and 
that our SVN branches are work-in-progress quality, not ready for review/commit 
quality.  It's common for files to be re-written in large chunks on branches.

Also, reposurgeon's commit logs don't have information on SVN path from which 
the change came, so there is no easy way to determine that a given commit is 
from a merged branch, not an original trunk commit.  Git-svn, on the other 
hand, provides "git-svn-id: <path>@<revision>" tags in its commit logs.

My conversion follows current GCC development policy that trunk history should 
be linear.  Branch merges to trunk are squashed.  Merges between non-trunk 
branches are handled as specified by svn:mergeinfo SVN properties.

== Differences in trees ==

Git trees (aka filesystem content) match between pretty/trunk and 
reposurgeon-5a/trunk from current tip and up tosvn's r130805.
Here is SVN log of that revision (restoration of deleted trunk):
------------------------------------------------------------------------
r130805 | dberlin | 2007-12-13 01:53:37 +0000 (Thu, 13 Dec 2007)
Changed paths:
   A /trunk (from /trunk:130802)
------------------------------------------------------------------------

Reposurgeon conversion has:
-------------
commit 7e6f2a96e89d96c2418482788f94155d87791f0a
Author: Daniel Berlin <dber...@gcc.gnu.org>
Date:   Thu Dec 13 01:53:37 2007 +0000

    Readd trunk
    
    Legacy-ID: 130805

 .gitignore | 17 -----------------
 1 file changed, 17 deletions(-)
-------------
and my conversion has:
-------------
commit fb128f3970789ce094c798945b4fa20eceb84cc7
Author: Daniel Berlin <dber...@dbrelin.org>
Date:   Thu Dec 13 01:53:37 2007 +0000

    Readd trunk
    
    
    git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@130805 
138bc75d-0d04-0410-961f-82ee72b054a4
-------------

It appears that .gitignore has been added in r1 by reposurgeon and then deleted 
at r130805.  In SVN repository .gitignore was added in r195087.  I speculate 
that addition of .gitignore at r1 is expected, but it's deletion at r130805 is 
highly suspicious.

== Committer entries ==

Reposurgeon uses $u...@gcc.gnu.org for committer email addresses even when it 
correctly detects author name from ChangeLog.

reposurgeon-5a:
r278995 Martin Liska <mli...@suse.cz> Martin Liska <mar...@gcc.gnu.org>
r278994 Jozef Lawrynowicz <joze...@mittosystems.com> Jozef Lawrynowicz 
<joz...@gcc.gnu.org>
r278993 Frederik Harwath <frede...@codesourcery.com> Frederik Harwath 
<frede...@gcc.gnu.org>
r278992 Georg-Johann Lay <a...@gjlay.de> Georg-Johann Lay <g...@gcc.gnu.org>
r278991 Richard Biener <rguent...@suse.de> Richard Biener <rgue...@gcc.gnu.org>

pretty:
r278995 Martin Liska <mli...@suse.cz> Martin Liska <mli...@suse.cz>
r278994 Jozef Lawrynowicz <joze...@mittosystems.com> Jozef Lawrynowicz 
<joze...@mittosystems.com>
r278993 Frederik Harwath <frede...@codesourcery.com> Frederik Harwath 
<frede...@codesourcery.com>
r278992 Georg-Johann Lay <a...@gjlay.de> Georg-Johann Lay <a...@gjlay.de>
r278991 Richard Biener <rguent...@suse.de> Richard Biener <rguent...@suse.de>

== Bad summary line ==

While looking around r138087, below caught my eye.  Is the contents of summary 
line as expected?

commit cc2726884d56995c514d8171cc4a03657851657e
Author: Chris Fairles <chris.fair...@gmail.com>
Date:   Wed Jul 23 14:49:00 2008 +0000

    acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
    
    2008-07-23  Chris Fairles <chris.fair...@gmail.com>
    
            * acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
            Holds the lib that defines clock_gettime (-lrt or -lposix4).
            * src/Makefile.am: Use it.
            * configure: Regenerate.
            * configure.in: Likewise.
            * Makefile.in: Likewise.
            * src/Makefile.in: Likewise.
            * libsup++/Makefile.in: Likewise.
            * po/Makefile.in: Likewise.
            * doc/Makefile.in: Likewise.
    
    Legacy-ID: 138087


--
Maxim Kuvyrkov
https://www.linaro.org

Reply via email to