> On Dec 30, 2019, at 7:08 PM, Richard Earnshaw (lists) 
> <richard.earns...@arm.com> wrote:
> 
> On 30/12/2019 15:49, Maxim Kuvyrkov wrote:
>>> On Dec 30, 2019, at 6:31 PM, Richard Earnshaw (lists) 
>>> <richard.earns...@arm.com> wrote:
>>> 
>>> On 30/12/2019 13:00, Maxim Kuvyrkov wrote:
>>>>> On Dec 30, 2019, at 1:24 AM, Richard Earnshaw (lists) 
>>>>> <richard.earns...@arm.com> wrote:
>>>>> 
>>>>> On 29/12/2019 18:30, Maxim Kuvyrkov wrote:
>>>>>> Below are several more issues I found in reposurgeon-6a conversion 
>>>>>> comparing it against gcc-reparent conversion.
>>>>>> 
>>>>>> I am sure, these and whatever other problems I may find in the 
>>>>>> reposurgeon conversion can be fixed in time.  However, I don't see why 
>>>>>> should bother.  My conversion has been available since summer 2019, I 
>>>>>> made it ready in time for GCC Cauldron 2019, and it didn't change in any 
>>>>>> significant way since then.
>>>>>> 
>>>>>> With the "Missed merges" problem (see below) I don't see how reposurgeon 
>>>>>> conversion can be considered "ready".  Also, I expected a diligent 
>>>>>> developer to compare new conversion (aka reposurgeon's) against existing 
>>>>>> conversion (aka gcc-pretty / gcc-reparent) before declaring the new 
>>>>>> conversion "better" or even "ready".  The data I'm seeing in differences 
>>>>>> between my and reposurgeon conversions shows that gcc-reparent 
>>>>>> conversion is /better/.
>>>>>> 
>>>>>> I suggest that GCC community adopts either gcc-pretty or gcc-reparent 
>>>>>> conversion.  I welcome Richard E. to modify his summary scripts to work 
>>>>>> with svn-git scripts, which should be straightforward, and I'm ready to 
>>>>>> help.
>>>>>> 
>>>>> 
>>>>> I don't think either of these conversions are any more ready to use than
>>>>> the reposurgeon one, possibly less so.  In fact, there are still some
>>>>> major issues to resolve first before they can be considered.
>>>>> 
>>>>> gcc-pretty has completely wrong parent information for the gcc-3 era
>>>>> release tags, showing the tags as being made directly from trunk with
>>>>> massive deltas representing the roll-up of all the commits that were
>>>>> made on the gcc-3 release branch.
>>>> 
>>>> I will clarify the above statement, and please correct me where you think 
>>>> I'm wrong.  Gcc-pretty conversion has the exact right parent information 
>>>> for the gcc-3 era
>>>> release tags as recorded in SVN version history.  Gcc-pretty conversion 
>>>> aims to produce an exact copy of SVN history in git.  IMO, it manages to 
>>>> do so just fine.
>>>> 
>>>> It is a different thing that SVN history has a screwed up record of gcc-3 
>>>> era tags.
>>> 
>>> It's not screwed up in svn.  Svn shows the correct history information for 
>>> the gcc-3 era release tags, but the git-svn conversion in gcc-pretty does 
>>> not.
>>> 
>>> For example, looking at gcc_3_0_release in expr.c with git blame and svn 
>>> blame shows
>> 
>> In SVN history tags/gcc_3_0_release has been copied off /trunk:39596 and in 
>> the same commit bunch of files were replaced from /branches/gcc-3_0-branch/ 
>> (and from different revisions of this branch!).
>> 
>> $ svn log -qv --stop-on-copy file://$(pwd)/tags/gcc_3_0_release | grep 
>> "/tags/gcc_3_0_release \|/tags/gcc_3_0_release/gcc/expr.c 
>> \|/tags/gcc_3_0_release/gcc/reload.c "
>>   A /tags/gcc_3_0_release (from /trunk:39596)
>>   R /tags/gcc_3_0_release/gcc/expr.c (from 
>> /branches/gcc-3_0-branch/gcc/expr.c:43255)
>>   R /tags/gcc_3_0_release/gcc/reload.c (from 
>> /branches/gcc-3_0-branch/gcc/reload.c:42007)
>> 
> 
> Right, (and wrong).  You have to understand how the release branches and
> tags are represented in CVS to understand why the SVN conversion is done
> this way.  When a branch was created in CVS a tag was added to each
> commit which would then be used in any future revisions along that
> branch.  But until a commit is made on that branch, the release branch
> is just a placeholder.
> 
> When a CVS release tag is created, the tag labels the relevant commit
> that is to be used.  If that commit is unchanged from the trunk revision
> (no commit on the branch), then that is what gets labelled, and it
> *appears* to still come from trunk - but that does not matter, since it
> is the same as the version on trunk.
> 
> The svn copy operations are formed from this set of information by
> copying the SVN revision of trunk that applied at the point the branch
> was made, and then overriding the copy information for each file that
> was then modified on the branch with information about that copy.  This
> is sufficient for svn to fully understand the history information for
> each and every file in the tag.
> 
> Unfortunately, git-svn mis-interprets this when building its graph of
> what happened and while it copies the right *content* into the release
> branch, it does not copy the right *history*.  The SVN R operation
> copies the history from named revision, not just the content.  That's
> the significant difference between the two.
> 
> R
>> IMO, from such history (absent external knowledge about better reparenting 
>> options) the best choice for parent branch is /trunk@39596, not 
>> /branches/gcc-3_0-branch at a random revision from the replaced files.
>> 
>> Still, I see your point, and I will fix reparenting support.  Whether GCC 
>> community opts to reparent or not reparent is a different topic.

I've added proper reparenting support to svn-git scripts, and gcc-reparent will 
be updated in a day or so.  I've also added a few minor improvements and fixed 
things that Joseph pointed out in my conversion.

Once gcc-reparent conversion is regenerated, I'll do another round of 
comparisons between it and whatever the latest reposurgeon version is.

--
Maxim Kuvyrkov
https://www.linaro.org

>> --
>> Maxim Kuvyrkov
>> https://www.linaro.org
>> 
>> 
>>> git blame expr.c:
>>> 
>>> ba0a9cb85431 (Richard Kenner         1992-03-03 23:34:57 +0000   396)       
>>>   return temp;
>>> ba0a9cb85431 (Richard Kenner         1992-03-03 23:34:57 +0000   397)       
>>> }
>>> 5fbf0b0d5828 (no-author              2001-06-17 19:44:25 +0000   398)     
>>> /* Copy the address into a pseudo, so that the returned value
>>> 5fbf0b0d5828 (no-author              2001-06-17 19:44:25 +0000   399)       
>>>  remains correct across calls to emit_queue.  */
>>> 5fbf0b0d5828 (no-author              2001-06-17 19:44:25 +0000   400)     
>>> XEXP (new, 0) = copy_to_reg (XEXP (new, 0));
>>> 59f26b7caad9 (Richard Kenner         1994-01-11 00:23:47 +0000   401)     
>>> return new;
>>> 
>>> git log 5fbf0b0d5828
>>> commit 5fbf0b0d5828687914c1c18a83ff12c8627d5a70 (HEAD, tag: gcc_3_0_release)
>>> Author: no-author <no-aut...@gcc.gnu.org>
>>> Date:   Sun Jun 17 19:44:25 2001 +0000
>>> 
>>>   This commit was manufactured by cvs2svn to create tag
>>>   'gcc_3_0_release'.
>>> 
>>> while svn blame expr.c correctly shows:
>>> 
>>>  386     kenner             return temp;
>>>  386     kenner           }
>>> 42209     bernds         /* Copy the address into a pseudo, so that the 
>>> returned value
>>> 42209     bernds            remains correct across calls to emit_queue.  */
>>> 42209     bernds         XEXP (new, 0) = copy_to_reg (XEXP (new, 0));
>>> 6375     kenner         return new;
>>> 
>>> svn log -r42209 ^/
>>> ------------------------------------------------------------------------
>>> r42209 | bernds | 2001-05-17 18:07:08 +0100 (Thu, 17 May 2001) | 2 lines
>>> 
>>> Fix queueing-related bugs
>>> 
>>> In other words, svn can correctly track the files that were modified on the 
>>> release branch, while the git conversion looses that information, rolling 
>>> up all the diffs on the release branch into a single unattributed commit.
>>> 
>>> As I said, gcc-reparent is better in this regard, but there are still 
>>> artefacts from conversion, such as incorrect merge records, that show up.
>>> 
>>> R.
>>> 
>>>> 
>>>>> 
>>>>> gcc-reparent is better, but many (most?) of the release tags are shown
>>>>> as merge commits with a fake parent back to the gcc-3 branch point,
>>>>> which is certainly not what happened when the tagging was done at that
>>>>> time.
>>>> 
>>>> I agree with you here.
>>>> 
>>>>> 
>>>>> Both of these factually misrepresent the history at the time of the
>>>>> release tag being made.
>>>> 
>>>> Yes and no.  Gcc-pretty repository mirrors SVN history.  And regarding the 
>>>> need for reparenting -- we lived with current history for gcc-3 release 
>>>> tags for a long time.  I would argue their continued brokenness is not a 
>>>> show-stopper.
>>>> 
>>>> Looking at this from a different perspective, when I posted the initial 
>>>> svn-git scripts back in Summer, the community roughly agreed on a plan to
>>>> 1. Convert entire SVN history to git.
>>>> 2. Use the stock git history rewrite tools (git filter-branch) to fixup 
>>>> what we want, e.g., reparent tags and branches or set better 
>>>> author/committer entries.
>>>> 
>>>> Gcc-pretty does (1) in entirety.
>>>> 
>>>> For reparenting, I tried a 15min fix to my scripts to enable reparenting, 
>>>> which worked, but with artifacts like the merge commit from old and new 
>>>> parents.  I will drop this and instead use tried-and-true "git 
>>>> filter-branch" to reparent those tags and branches, thus producing 
>>>> gcc-reparent from gcc-pretty.
>>>> 
>>>>> 
>>>>> As for converting my script to work with your tools, I'm afraid I don't
>>>>> have time to work on that right now.  I'm still bogged down validating
>>>>> the incorrect bug ids that the script has identified for some commits.
>>>>> I'm making good progress (we're down to 160 unreviewed commits now), but
>>>>> it is still going to take what time I have over the next week to
>>>>> complete that task.
>>>>> 
>>>>> Furthermore, there is no documentation on how your conversion scripts
>>>>> work, so it is not possible for me to test any work I might do in order
>>>>> to validate such changes.  Not being able to run the script locally to
>>>>> test change would be a non-starter.
>>>>> 
>>>>> You are welcome, of course, to clone the script I have and attempt to
>>>>> modify it yourself, it's reasonably well documented.  The sources can be
>>>>> found in esr's gcc-conversion repository here:
>>>>> https://gitlab.com/esr/gcc-conversion.git
>>>> 
>>>> --
>>>> Maxim Kuvyrkov
>>>> https://www.linaro.org
>>>> 
>>>>> 
>>>>> 
>>>>>> Meanwhile, I'm going to add additional root commits to my gcc-reparent 
>>>>>> conversion to bring in "missing" branches (the ones, which don't share 
>>>>>> history with trunk@1) and restart daily updates of gcc-reparent 
>>>>>> conversion.
>>>>>> 
>>>>>> Finally, with the comparison data I have, I consider statements about 
>>>>>> git-svn's poor quality to be very misleading.  Git-svn may have had 
>>>>>> serious bugs years ago when Eric R. evaluated it and started his work on 
>>>>>> reposurgeon.  But a lot of development has happened and many problems 
>>>>>> have been fixed since them.  At the moment it is reposurgeon that is 
>>>>>> producing conversions with obscure mistakes in repository metadata.
>>>>>> 
>>>>>> 
>>>>>> === Missed merges ===
>>>>>> 
>>>>>> Reposurgeon misses merges from trunk on 130+ branches.  I've 
>>>>>> spot-checked ARM/hard_vfp_branch and redhat/gcc-9-branch and, indeed, 
>>>>>> rather mundane merges were omitted.  Below is analysis for 
>>>>>> ARM/hard_vfp_branch.
>>>>>> 
>>>>>> $ git log --stat refs/remotes/gcc-reposurgeon-6a/ARM/hard_vfp_branch~4
>>>>>> ----
>>>>>> commit ef92c24b042965dfef982349cd5994a2e0ff5fde
>>>>>> Author: Richard Earnshaw <rearn...@gcc.gnu.org>
>>>>>> Date:   Mon Jul 20 08:15:51 2009 +0000
>>>>>> 
>>>>>>  Merge trunk through to r149768
>>>>>> 
>>>>>>  Legacy-ID: 149804
>>>>>> 
>>>>>> COPYING.RUNTIME                                     |    73 +
>>>>>> ChangeLog                                           |   270 +-
>>>>>> MAINTAINERS                                         |    19 +-
>>>>>> <MANY OTHER FILES>
>>>>>> ----
>>>>>> 
>>>>>> at the same time for svn-git scripts we have:
>>>>>> 
>>>>>> $ git log --stat refs/remotes/gcc-reparent/ARM/hard_vfp_branch~4
>>>>>> ----
>>>>>> commit ce7d5c8df673a7a561c29f095869f20567a7c598
>>>>>> Merge: 4970119c20da 3a69b1e566a7
>>>>>> Author: Richard Earnshaw <rearn...@arm.com>
>>>>>> Date:   Mon Jul 20 08:15:51 2009 +0000
>>>>>> 
>>>>>>  Merge trunk through to r149768
>>>>>> 
>>>>>>  git-svn-id: 
>>>>>> https://gcc.gnu.org/svn/gcc/branches/ARM/hard_vfp_branch@149804 
>>>>>> 138bc75d-0d04-0410-961f-82ee72b054a4
>>>>>> ----
>>>>>> 
>>>>>> ... which agrees with
>>>>>> $ svn propget svn:mergeinfo 
>>>>>> file:///home/maxim.kuvyrkov/tmpfs-stuff/svnrepo/branches/ARM/hard_vfp_branch@149804
>>>>>> /trunk:142588-149768
>>>>>> 
>>>>>> === Bad author entries ===
>>>>>> 
>>>>>> Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and 
>>>>>> "2005-03-18 Kazu Hirata".  It is rather obvious that person's name is 
>>>>>> unlikely to start with a digit.
>>>>>> 
>>>>>> === Missed authors ===
>>>>>> 
>>>>>> Reposurgeon-6a conversion misses many authors, below is a list of people 
>>>>>> with names starting with "A".
>>>>>> 
>>>>>> Akos Kiss
>>>>>> Anders Bertelrud
>>>>>> Andrew Pochinsky
>>>>>> Anton Hartl
>>>>>> Arthur Norman
>>>>>> Aymeric Vincent
>>>>>> 
>>>>>> === Conservative author entries ===
>>>>>> 
>>>>>> Reposurgeon-6a conversion uses default "@gcc.gnu.org" emails for many 
>>>>>> commits where svn-git conversion manages to extract valid email from 
>>>>>> commit data.  This happens for hundreds of author entries.
>>>>>> 
>>>>>> Regards,
>>>>>> 
>>>>>> --
>>>>>> Maxim Kuvyrkov
>>>>>> https://www.linaro.org
>>>>>> 
>>>>>> 
>>>>>>> On Dec 26, 2019, at 7:11 PM, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> On Dec 26, 2019, at 2:16 PM, Jakub Jelinek <ja...@redhat.com> wrote:
>>>>>>>> 
>>>>>>>> On Thu, Dec 26, 2019 at 11:04:29AM +0000, Joseph Myers wrote:
>>>>>>>> Is there some easy way (e.g. file in the conversion scripts) to correct
>>>>>>>> spelling and other mistakes in the commit authors?
>>>>>>>> E.g. there are misspelled surnames, etc. (e.g. looking at my name, I 
>>>>>>>> see
>>>>>>>> Jakub Jakub Jelinek (1):
>>>>>>>> Jakub Jeilnek (1):
>>>>>>>> Jelinek (1):
>>>>>>>> entries next to the expected one with most of the commits.
>>>>>>>> For the misspellings, wonder if e.g. we couldn't compute edit 
>>>>>>>> distances from
>>>>>>>> other names and if we have one with many commits and then one with 
>>>>>>>> very few
>>>>>>>> with small edit distance from those, flag it for human review.
>>>>>>> 
>>>>>>> This is close to what svn-git-author.sh script is doing in gcc-pretty 
>>>>>>> and gcc-reparent conversions.  It ignores 1-3 character differences in 
>>>>>>> author/committer names and email addresses.  I've audited results for 
>>>>>>> all branches and didn't spot any mistakes.
>>>>>>> 
>>>>>>> In other news, I'm working on comparison of gcc-pretty, gcc-reparent 
>>>>>>> and gcc-reposurgeon-5a repos among themselves.  Below are current notes 
>>>>>>> for comparison of gcc-pretty/trunk and gcc-reposurgeon-5a/trunk.
>>>>>>> 
>>>>>>> == Merges on trunk ==
>>>>>>> 
>>>>>>> Reposurgeon creates merge entries on trunk when changes from a branch 
>>>>>>> are merged into trunk.  This brings entire development history from the 
>>>>>>> branch to trunk, which is both good and bad.  The good part is that we 
>>>>>>> get more visibility into how the code evolved.  The bad part is that we 
>>>>>>> get many "noisy" commits from merged branch (e.g., "Merge in trunk" 
>>>>>>> every few revisions) and that our SVN branches are work-in-progress 
>>>>>>> quality, not ready for review/commit quality.  It's common for files to 
>>>>>>> be re-written in large chunks on branches.
>>>>>>> 
>>>>>>> Also, reposurgeon's commit logs don't have information on SVN path from 
>>>>>>> which the change came, so there is no easy way to determine that a 
>>>>>>> given commit is from a merged branch, not an original trunk commit.  
>>>>>>> Git-svn, on the other hand, provides "git-svn-id: <path>@<revision>" 
>>>>>>> tags in its commit logs.
>>>>>>> 
>>>>>>> My conversion follows current GCC development policy that trunk history 
>>>>>>> should be linear.  Branch merges to trunk are squashed.  Merges between 
>>>>>>> non-trunk branches are handled as specified by svn:mergeinfo SVN 
>>>>>>> properties.
>>>>>>> 
>>>>>>> == Differences in trees ==
>>>>>>> 
>>>>>>> Git trees (aka filesystem content) match between pretty/trunk and 
>>>>>>> reposurgeon-5a/trunk from current tip and up tosvn's r130805.
>>>>>>> Here is SVN log of that revision (restoration of deleted trunk):
>>>>>>> ------------------------------------------------------------------------
>>>>>>> r130805 | dberlin | 2007-12-13 01:53:37 +0000 (Thu, 13 Dec 2007)
>>>>>>> Changed paths:
>>>>>>> A /trunk (from /trunk:130802)
>>>>>>> ------------------------------------------------------------------------
>>>>>>> 
>>>>>>> Reposurgeon conversion has:
>>>>>>> -------------
>>>>>>> commit 7e6f2a96e89d96c2418482788f94155d87791f0a
>>>>>>> Author: Daniel Berlin <dber...@gcc.gnu.org>
>>>>>>> Date:   Thu Dec 13 01:53:37 2007 +0000
>>>>>>> 
>>>>>>> Readd trunk
>>>>>>> 
>>>>>>> Legacy-ID: 130805
>>>>>>> 
>>>>>>> .gitignore | 17 -----------------
>>>>>>> 1 file changed, 17 deletions(-)
>>>>>>> -------------
>>>>>>> and my conversion has:
>>>>>>> -------------
>>>>>>> commit fb128f3970789ce094c798945b4fa20eceb84cc7
>>>>>>> Author: Daniel Berlin <dber...@dbrelin.org>
>>>>>>> Date:   Thu Dec 13 01:53:37 2007 +0000
>>>>>>> 
>>>>>>> Readd trunk
>>>>>>> 
>>>>>>> 
>>>>>>> git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@130805 
>>>>>>> 138bc75d-0d04-0410-961f-82ee72b054a4
>>>>>>> -------------
>>>>>>> 
>>>>>>> It appears that .gitignore has been added in r1 by reposurgeon and then 
>>>>>>> deleted at r130805.  In SVN repository .gitignore was added in r195087. 
>>>>>>>  I speculate that addition of .gitignore at r1 is expected, but it's 
>>>>>>> deletion at r130805 is highly suspicious.
>>>>>>> 
>>>>>>> == Committer entries ==
>>>>>>> 
>>>>>>> Reposurgeon uses $u...@gcc.gnu.org for committer email addresses even 
>>>>>>> when it correctly detects author name from ChangeLog.
>>>>>>> 
>>>>>>> reposurgeon-5a:
>>>>>>> r278995 Martin Liska <mli...@suse.cz> Martin Liska <mar...@gcc.gnu.org>
>>>>>>> r278994 Jozef Lawrynowicz <joze...@mittosystems.com> Jozef Lawrynowicz 
>>>>>>> <joz...@gcc.gnu.org>
>>>>>>> r278993 Frederik Harwath <frede...@codesourcery.com> Frederik Harwath 
>>>>>>> <frede...@gcc.gnu.org>
>>>>>>> r278992 Georg-Johann Lay <a...@gjlay.de> Georg-Johann Lay 
>>>>>>> <g...@gcc.gnu.org>
>>>>>>> r278991 Richard Biener <rguent...@suse.de> Richard Biener 
>>>>>>> <rgue...@gcc.gnu.org>
>>>>>>> 
>>>>>>> pretty:
>>>>>>> r278995 Martin Liska <mli...@suse.cz> Martin Liska <mli...@suse.cz>
>>>>>>> r278994 Jozef Lawrynowicz <joze...@mittosystems.com> Jozef Lawrynowicz 
>>>>>>> <joze...@mittosystems.com>
>>>>>>> r278993 Frederik Harwath <frede...@codesourcery.com> Frederik Harwath 
>>>>>>> <frede...@codesourcery.com>
>>>>>>> r278992 Georg-Johann Lay <a...@gjlay.de> Georg-Johann Lay 
>>>>>>> <a...@gjlay.de>
>>>>>>> r278991 Richard Biener <rguent...@suse.de> Richard Biener 
>>>>>>> <rguent...@suse.de>
>>>>>>> 
>>>>>>> == Bad summary line ==
>>>>>>> 
>>>>>>> While looking around r138087, below caught my eye.  Is the contents of 
>>>>>>> summary line as expected?
>>>>>>> 
>>>>>>> commit cc2726884d56995c514d8171cc4a03657851657e
>>>>>>> Author: Chris Fairles <chris.fair...@gmail.com>
>>>>>>> Date:   Wed Jul 23 14:49:00 2008 +0000
>>>>>>> 
>>>>>>> acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
>>>>>>> 
>>>>>>> 2008-07-23  Chris Fairles <chris.fair...@gmail.com>
>>>>>>> 
>>>>>>>         * acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define 
>>>>>>> GLIBCXX_LIBS.
>>>>>>>         Holds the lib that defines clock_gettime (-lrt or -lposix4).
>>>>>>>         * src/Makefile.am: Use it.
>>>>>>>         * configure: Regenerate.
>>>>>>>         * configure.in: Likewise.
>>>>>>>         * Makefile.in: Likewise.
>>>>>>>         * src/Makefile.in: Likewise.
>>>>>>>         * libsup++/Makefile.in: Likewise.
>>>>>>>         * po/Makefile.in: Likewise.
>>>>>>>         * doc/Makefile.in: Likewise.
>>>>>>> 
>>>>>>> Legacy-ID: 138087
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Maxim Kuvyrkov
>>>>>>> https://www.linaro.org
>> 
> 

Reply via email to