> On Dec 30, 2019, at 6:31 PM, Richard Earnshaw (lists) > <richard.earns...@arm.com> wrote: > > On 30/12/2019 13:00, Maxim Kuvyrkov wrote: >>> On Dec 30, 2019, at 1:24 AM, Richard Earnshaw (lists) >>> <richard.earns...@arm.com> wrote: >>> >>> On 29/12/2019 18:30, Maxim Kuvyrkov wrote: >>>> Below are several more issues I found in reposurgeon-6a conversion >>>> comparing it against gcc-reparent conversion. >>>> >>>> I am sure, these and whatever other problems I may find in the reposurgeon >>>> conversion can be fixed in time. However, I don't see why should bother. >>>> My conversion has been available since summer 2019, I made it ready in >>>> time for GCC Cauldron 2019, and it didn't change in any significant way >>>> since then. >>>> >>>> With the "Missed merges" problem (see below) I don't see how reposurgeon >>>> conversion can be considered "ready". Also, I expected a diligent >>>> developer to compare new conversion (aka reposurgeon's) against existing >>>> conversion (aka gcc-pretty / gcc-reparent) before declaring the new >>>> conversion "better" or even "ready". The data I'm seeing in differences >>>> between my and reposurgeon conversions shows that gcc-reparent conversion >>>> is /better/. >>>> >>>> I suggest that GCC community adopts either gcc-pretty or gcc-reparent >>>> conversion. I welcome Richard E. to modify his summary scripts to work >>>> with svn-git scripts, which should be straightforward, and I'm ready to >>>> help. >>>> >>> >>> I don't think either of these conversions are any more ready to use than >>> the reposurgeon one, possibly less so. In fact, there are still some >>> major issues to resolve first before they can be considered. >>> >>> gcc-pretty has completely wrong parent information for the gcc-3 era >>> release tags, showing the tags as being made directly from trunk with >>> massive deltas representing the roll-up of all the commits that were >>> made on the gcc-3 release branch. >> >> I will clarify the above statement, and please correct me where you think >> I'm wrong. Gcc-pretty conversion has the exact right parent information for >> the gcc-3 era >> release tags as recorded in SVN version history. Gcc-pretty conversion aims >> to produce an exact copy of SVN history in git. IMO, it manages to do so >> just fine. >> >> It is a different thing that SVN history has a screwed up record of gcc-3 >> era tags. > > It's not screwed up in svn. Svn shows the correct history information for > the gcc-3 era release tags, but the git-svn conversion in gcc-pretty does not. > > For example, looking at gcc_3_0_release in expr.c with git blame and svn > blame shows
In SVN history tags/gcc_3_0_release has been copied off /trunk:39596 and in the same commit bunch of files were replaced from /branches/gcc-3_0-branch/ (and from different revisions of this branch!). $ svn log -qv --stop-on-copy file://$(pwd)/tags/gcc_3_0_release | grep "/tags/gcc_3_0_release \|/tags/gcc_3_0_release/gcc/expr.c \|/tags/gcc_3_0_release/gcc/reload.c " A /tags/gcc_3_0_release (from /trunk:39596) R /tags/gcc_3_0_release/gcc/expr.c (from /branches/gcc-3_0-branch/gcc/expr.c:43255) R /tags/gcc_3_0_release/gcc/reload.c (from /branches/gcc-3_0-branch/gcc/reload.c:42007) IMO, from such history (absent external knowledge about better reparenting options) the best choice for parent branch is /trunk@39596, not /branches/gcc-3_0-branch at a random revision from the replaced files. Still, I see your point, and I will fix reparenting support. Whether GCC community opts to reparent or not reparent is a different topic. -- Maxim Kuvyrkov https://www.linaro.org > git blame expr.c: > > ba0a9cb85431 (Richard Kenner 1992-03-03 23:34:57 +0000 396) > return temp; > ba0a9cb85431 (Richard Kenner 1992-03-03 23:34:57 +0000 397) } > 5fbf0b0d5828 (no-author 2001-06-17 19:44:25 +0000 398) /* > Copy the address into a pseudo, so that the returned value > 5fbf0b0d5828 (no-author 2001-06-17 19:44:25 +0000 399) > remains correct across calls to emit_queue. */ > 5fbf0b0d5828 (no-author 2001-06-17 19:44:25 +0000 400) > XEXP (new, 0) = copy_to_reg (XEXP (new, 0)); > 59f26b7caad9 (Richard Kenner 1994-01-11 00:23:47 +0000 401) > return new; > > git log 5fbf0b0d5828 > commit 5fbf0b0d5828687914c1c18a83ff12c8627d5a70 (HEAD, tag: gcc_3_0_release) > Author: no-author <no-aut...@gcc.gnu.org> > Date: Sun Jun 17 19:44:25 2001 +0000 > > This commit was manufactured by cvs2svn to create tag > 'gcc_3_0_release'. > > while svn blame expr.c correctly shows: > > 386 kenner return temp; > 386 kenner } > 42209 bernds /* Copy the address into a pseudo, so that the > returned value > 42209 bernds remains correct across calls to emit_queue. */ > 42209 bernds XEXP (new, 0) = copy_to_reg (XEXP (new, 0)); > 6375 kenner return new; > > svn log -r42209 ^/ > ------------------------------------------------------------------------ > r42209 | bernds | 2001-05-17 18:07:08 +0100 (Thu, 17 May 2001) | 2 lines > > Fix queueing-related bugs > > In other words, svn can correctly track the files that were modified on the > release branch, while the git conversion looses that information, rolling up > all the diffs on the release branch into a single unattributed commit. > > As I said, gcc-reparent is better in this regard, but there are still > artefacts from conversion, such as incorrect merge records, that show up. > > R. > >> >>> >>> gcc-reparent is better, but many (most?) of the release tags are shown >>> as merge commits with a fake parent back to the gcc-3 branch point, >>> which is certainly not what happened when the tagging was done at that >>> time. >> >> I agree with you here. >> >>> >>> Both of these factually misrepresent the history at the time of the >>> release tag being made. >> >> Yes and no. Gcc-pretty repository mirrors SVN history. And regarding the >> need for reparenting -- we lived with current history for gcc-3 release tags >> for a long time. I would argue their continued brokenness is not a >> show-stopper. >> >> Looking at this from a different perspective, when I posted the initial >> svn-git scripts back in Summer, the community roughly agreed on a plan to >> 1. Convert entire SVN history to git. >> 2. Use the stock git history rewrite tools (git filter-branch) to fixup what >> we want, e.g., reparent tags and branches or set better author/committer >> entries. >> >> Gcc-pretty does (1) in entirety. >> >> For reparenting, I tried a 15min fix to my scripts to enable reparenting, >> which worked, but with artifacts like the merge commit from old and new >> parents. I will drop this and instead use tried-and-true "git >> filter-branch" to reparent those tags and branches, thus producing >> gcc-reparent from gcc-pretty. >> >>> >>> As for converting my script to work with your tools, I'm afraid I don't >>> have time to work on that right now. I'm still bogged down validating >>> the incorrect bug ids that the script has identified for some commits. >>> I'm making good progress (we're down to 160 unreviewed commits now), but >>> it is still going to take what time I have over the next week to >>> complete that task. >>> >>> Furthermore, there is no documentation on how your conversion scripts >>> work, so it is not possible for me to test any work I might do in order >>> to validate such changes. Not being able to run the script locally to >>> test change would be a non-starter. >>> >>> You are welcome, of course, to clone the script I have and attempt to >>> modify it yourself, it's reasonably well documented. The sources can be >>> found in esr's gcc-conversion repository here: >>> https://gitlab.com/esr/gcc-conversion.git >> >> -- >> Maxim Kuvyrkov >> https://www.linaro.org >> >>> >>> >>>> Meanwhile, I'm going to add additional root commits to my gcc-reparent >>>> conversion to bring in "missing" branches (the ones, which don't share >>>> history with trunk@1) and restart daily updates of gcc-reparent conversion. >>>> >>>> Finally, with the comparison data I have, I consider statements about >>>> git-svn's poor quality to be very misleading. Git-svn may have had >>>> serious bugs years ago when Eric R. evaluated it and started his work on >>>> reposurgeon. But a lot of development has happened and many problems have >>>> been fixed since them. At the moment it is reposurgeon that is producing >>>> conversions with obscure mistakes in repository metadata. >>>> >>>> >>>> === Missed merges === >>>> >>>> Reposurgeon misses merges from trunk on 130+ branches. I've spot-checked >>>> ARM/hard_vfp_branch and redhat/gcc-9-branch and, indeed, rather mundane >>>> merges were omitted. Below is analysis for ARM/hard_vfp_branch. >>>> >>>> $ git log --stat refs/remotes/gcc-reposurgeon-6a/ARM/hard_vfp_branch~4 >>>> ---- >>>> commit ef92c24b042965dfef982349cd5994a2e0ff5fde >>>> Author: Richard Earnshaw <rearn...@gcc.gnu.org> >>>> Date: Mon Jul 20 08:15:51 2009 +0000 >>>> >>>> Merge trunk through to r149768 >>>> >>>> Legacy-ID: 149804 >>>> >>>> COPYING.RUNTIME | 73 + >>>> ChangeLog | 270 +- >>>> MAINTAINERS | 19 +- >>>> <MANY OTHER FILES> >>>> ---- >>>> >>>> at the same time for svn-git scripts we have: >>>> >>>> $ git log --stat refs/remotes/gcc-reparent/ARM/hard_vfp_branch~4 >>>> ---- >>>> commit ce7d5c8df673a7a561c29f095869f20567a7c598 >>>> Merge: 4970119c20da 3a69b1e566a7 >>>> Author: Richard Earnshaw <rearn...@arm.com> >>>> Date: Mon Jul 20 08:15:51 2009 +0000 >>>> >>>> Merge trunk through to r149768 >>>> >>>> git-svn-id: >>>> https://gcc.gnu.org/svn/gcc/branches/ARM/hard_vfp_branch@149804 >>>> 138bc75d-0d04-0410-961f-82ee72b054a4 >>>> ---- >>>> >>>> ... which agrees with >>>> $ svn propget svn:mergeinfo >>>> file:///home/maxim.kuvyrkov/tmpfs-stuff/svnrepo/branches/ARM/hard_vfp_branch@149804 >>>> /trunk:142588-149768 >>>> >>>> === Bad author entries === >>>> >>>> Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and >>>> "2005-03-18 Kazu Hirata". It is rather obvious that person's name is >>>> unlikely to start with a digit. >>>> >>>> === Missed authors === >>>> >>>> Reposurgeon-6a conversion misses many authors, below is a list of people >>>> with names starting with "A". >>>> >>>> Akos Kiss >>>> Anders Bertelrud >>>> Andrew Pochinsky >>>> Anton Hartl >>>> Arthur Norman >>>> Aymeric Vincent >>>> >>>> === Conservative author entries === >>>> >>>> Reposurgeon-6a conversion uses default "@gcc.gnu.org" emails for many >>>> commits where svn-git conversion manages to extract valid email from >>>> commit data. This happens for hundreds of author entries. >>>> >>>> Regards, >>>> >>>> -- >>>> Maxim Kuvyrkov >>>> https://www.linaro.org >>>> >>>> >>>>> On Dec 26, 2019, at 7:11 PM, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> >>>>> wrote: >>>>> >>>>> >>>>>> On Dec 26, 2019, at 2:16 PM, Jakub Jelinek <ja...@redhat.com> wrote: >>>>>> >>>>>> On Thu, Dec 26, 2019 at 11:04:29AM +0000, Joseph Myers wrote: >>>>>> Is there some easy way (e.g. file in the conversion scripts) to correct >>>>>> spelling and other mistakes in the commit authors? >>>>>> E.g. there are misspelled surnames, etc. (e.g. looking at my name, I see >>>>>> Jakub Jakub Jelinek (1): >>>>>> Jakub Jeilnek (1): >>>>>> Jelinek (1): >>>>>> entries next to the expected one with most of the commits. >>>>>> For the misspellings, wonder if e.g. we couldn't compute edit distances >>>>>> from >>>>>> other names and if we have one with many commits and then one with very >>>>>> few >>>>>> with small edit distance from those, flag it for human review. >>>>> >>>>> This is close to what svn-git-author.sh script is doing in gcc-pretty and >>>>> gcc-reparent conversions. It ignores 1-3 character differences in >>>>> author/committer names and email addresses. I've audited results for all >>>>> branches and didn't spot any mistakes. >>>>> >>>>> In other news, I'm working on comparison of gcc-pretty, gcc-reparent and >>>>> gcc-reposurgeon-5a repos among themselves. Below are current notes for >>>>> comparison of gcc-pretty/trunk and gcc-reposurgeon-5a/trunk. >>>>> >>>>> == Merges on trunk == >>>>> >>>>> Reposurgeon creates merge entries on trunk when changes from a branch are >>>>> merged into trunk. This brings entire development history from the >>>>> branch to trunk, which is both good and bad. The good part is that we >>>>> get more visibility into how the code evolved. The bad part is that we >>>>> get many "noisy" commits from merged branch (e.g., "Merge in trunk" every >>>>> few revisions) and that our SVN branches are work-in-progress quality, >>>>> not ready for review/commit quality. It's common for files to be >>>>> re-written in large chunks on branches. >>>>> >>>>> Also, reposurgeon's commit logs don't have information on SVN path from >>>>> which the change came, so there is no easy way to determine that a given >>>>> commit is from a merged branch, not an original trunk commit. Git-svn, >>>>> on the other hand, provides "git-svn-id: <path>@<revision>" tags in its >>>>> commit logs. >>>>> >>>>> My conversion follows current GCC development policy that trunk history >>>>> should be linear. Branch merges to trunk are squashed. Merges between >>>>> non-trunk branches are handled as specified by svn:mergeinfo SVN >>>>> properties. >>>>> >>>>> == Differences in trees == >>>>> >>>>> Git trees (aka filesystem content) match between pretty/trunk and >>>>> reposurgeon-5a/trunk from current tip and up tosvn's r130805. >>>>> Here is SVN log of that revision (restoration of deleted trunk): >>>>> ------------------------------------------------------------------------ >>>>> r130805 | dberlin | 2007-12-13 01:53:37 +0000 (Thu, 13 Dec 2007) >>>>> Changed paths: >>>>> A /trunk (from /trunk:130802) >>>>> ------------------------------------------------------------------------ >>>>> >>>>> Reposurgeon conversion has: >>>>> ------------- >>>>> commit 7e6f2a96e89d96c2418482788f94155d87791f0a >>>>> Author: Daniel Berlin <dber...@gcc.gnu.org> >>>>> Date: Thu Dec 13 01:53:37 2007 +0000 >>>>> >>>>> Readd trunk >>>>> >>>>> Legacy-ID: 130805 >>>>> >>>>> .gitignore | 17 ----------------- >>>>> 1 file changed, 17 deletions(-) >>>>> ------------- >>>>> and my conversion has: >>>>> ------------- >>>>> commit fb128f3970789ce094c798945b4fa20eceb84cc7 >>>>> Author: Daniel Berlin <dber...@dbrelin.org> >>>>> Date: Thu Dec 13 01:53:37 2007 +0000 >>>>> >>>>> Readd trunk >>>>> >>>>> >>>>> git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@130805 >>>>> 138bc75d-0d04-0410-961f-82ee72b054a4 >>>>> ------------- >>>>> >>>>> It appears that .gitignore has been added in r1 by reposurgeon and then >>>>> deleted at r130805. In SVN repository .gitignore was added in r195087. >>>>> I speculate that addition of .gitignore at r1 is expected, but it's >>>>> deletion at r130805 is highly suspicious. >>>>> >>>>> == Committer entries == >>>>> >>>>> Reposurgeon uses $u...@gcc.gnu.org for committer email addresses even >>>>> when it correctly detects author name from ChangeLog. >>>>> >>>>> reposurgeon-5a: >>>>> r278995 Martin Liska <mli...@suse.cz> Martin Liska <mar...@gcc.gnu.org> >>>>> r278994 Jozef Lawrynowicz <joze...@mittosystems.com> Jozef Lawrynowicz >>>>> <joz...@gcc.gnu.org> >>>>> r278993 Frederik Harwath <frede...@codesourcery.com> Frederik Harwath >>>>> <frede...@gcc.gnu.org> >>>>> r278992 Georg-Johann Lay <a...@gjlay.de> Georg-Johann Lay >>>>> <g...@gcc.gnu.org> >>>>> r278991 Richard Biener <rguent...@suse.de> Richard Biener >>>>> <rgue...@gcc.gnu.org> >>>>> >>>>> pretty: >>>>> r278995 Martin Liska <mli...@suse.cz> Martin Liska <mli...@suse.cz> >>>>> r278994 Jozef Lawrynowicz <joze...@mittosystems.com> Jozef Lawrynowicz >>>>> <joze...@mittosystems.com> >>>>> r278993 Frederik Harwath <frede...@codesourcery.com> Frederik Harwath >>>>> <frede...@codesourcery.com> >>>>> r278992 Georg-Johann Lay <a...@gjlay.de> Georg-Johann Lay <a...@gjlay.de> >>>>> r278991 Richard Biener <rguent...@suse.de> Richard Biener >>>>> <rguent...@suse.de> >>>>> >>>>> == Bad summary line == >>>>> >>>>> While looking around r138087, below caught my eye. Is the contents of >>>>> summary line as expected? >>>>> >>>>> commit cc2726884d56995c514d8171cc4a03657851657e >>>>> Author: Chris Fairles <chris.fair...@gmail.com> >>>>> Date: Wed Jul 23 14:49:00 2008 +0000 >>>>> >>>>> acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS. >>>>> >>>>> 2008-07-23 Chris Fairles <chris.fair...@gmail.com> >>>>> >>>>> * acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define >>>>> GLIBCXX_LIBS. >>>>> Holds the lib that defines clock_gettime (-lrt or -lposix4). >>>>> * src/Makefile.am: Use it. >>>>> * configure: Regenerate. >>>>> * configure.in: Likewise. >>>>> * Makefile.in: Likewise. >>>>> * src/Makefile.in: Likewise. >>>>> * libsup++/Makefile.in: Likewise. >>>>> * po/Makefile.in: Likewise. >>>>> * doc/Makefile.in: Likewise. >>>>> >>>>> Legacy-ID: 138087 >>>>> >>>>> >>>>> -- >>>>> Maxim Kuvyrkov >>>>> https://www.linaro.org