> On Dec 30, 2019, at 7:08 PM, Richard Earnshaw (lists) > <richard.earns...@arm.com> wrote: > > On 30/12/2019 15:49, Maxim Kuvyrkov wrote: >>> On Dec 30, 2019, at 6:31 PM, Richard Earnshaw (lists) >>> <richard.earns...@arm.com> wrote: >>> >>> On 30/12/2019 13:00, Maxim Kuvyrkov wrote: >>>>> On Dec 30, 2019, at 1:24 AM, Richard Earnshaw (lists) >>>>> <richard.earns...@arm.com> wrote: >>>>> >>>>> On 29/12/2019 18:30, Maxim Kuvyrkov wrote: >>>>>> Below are several more issues I found in reposurgeon-6a conversion >>>>>> comparing it against gcc-reparent conversion. >>>>>> >>>>>> I am sure, these and whatever other problems I may find in the >>>>>> reposurgeon conversion can be fixed in time. However, I don't see why >>>>>> should bother. My conversion has been available since summer 2019, I >>>>>> made it ready in time for GCC Cauldron 2019, and it didn't change in any >>>>>> significant way since then. >>>>>> >>>>>> With the "Missed merges" problem (see below) I don't see how reposurgeon >>>>>> conversion can be considered "ready". Also, I expected a diligent >>>>>> developer to compare new conversion (aka reposurgeon's) against existing >>>>>> conversion (aka gcc-pretty / gcc-reparent) before declaring the new >>>>>> conversion "better" or even "ready". The data I'm seeing in differences >>>>>> between my and reposurgeon conversions shows that gcc-reparent >>>>>> conversion is /better/. >>>>>> >>>>>> I suggest that GCC community adopts either gcc-pretty or gcc-reparent >>>>>> conversion. I welcome Richard E. to modify his summary scripts to work >>>>>> with svn-git scripts, which should be straightforward, and I'm ready to >>>>>> help. >>>>>> >>>>> >>>>> I don't think either of these conversions are any more ready to use than >>>>> the reposurgeon one, possibly less so. In fact, there are still some >>>>> major issues to resolve first before they can be considered. >>>>> >>>>> gcc-pretty has completely wrong parent information for the gcc-3 era >>>>> release tags, showing the tags as being made directly from trunk with >>>>> massive deltas representing the roll-up of all the commits that were >>>>> made on the gcc-3 release branch. >>>> >>>> I will clarify the above statement, and please correct me where you think >>>> I'm wrong. Gcc-pretty conversion has the exact right parent information >>>> for the gcc-3 era >>>> release tags as recorded in SVN version history. Gcc-pretty conversion >>>> aims to produce an exact copy of SVN history in git. IMO, it manages to >>>> do so just fine. >>>> >>>> It is a different thing that SVN history has a screwed up record of gcc-3 >>>> era tags. >>> >>> It's not screwed up in svn. Svn shows the correct history information for >>> the gcc-3 era release tags, but the git-svn conversion in gcc-pretty does >>> not. >>> >>> For example, looking at gcc_3_0_release in expr.c with git blame and svn >>> blame shows >> >> In SVN history tags/gcc_3_0_release has been copied off /trunk:39596 and in >> the same commit bunch of files were replaced from /branches/gcc-3_0-branch/ >> (and from different revisions of this branch!). >> >> $ svn log -qv --stop-on-copy file://$(pwd)/tags/gcc_3_0_release | grep >> "/tags/gcc_3_0_release \|/tags/gcc_3_0_release/gcc/expr.c >> \|/tags/gcc_3_0_release/gcc/reload.c " >> A /tags/gcc_3_0_release (from /trunk:39596) >> R /tags/gcc_3_0_release/gcc/expr.c (from >> /branches/gcc-3_0-branch/gcc/expr.c:43255) >> R /tags/gcc_3_0_release/gcc/reload.c (from >> /branches/gcc-3_0-branch/gcc/reload.c:42007) >> > > Right, (and wrong). You have to understand how the release branches and > tags are represented in CVS to understand why the SVN conversion is done > this way. When a branch was created in CVS a tag was added to each > commit which would then be used in any future revisions along that > branch. But until a commit is made on that branch, the release branch > is just a placeholder. > > When a CVS release tag is created, the tag labels the relevant commit > that is to be used. If that commit is unchanged from the trunk revision > (no commit on the branch), then that is what gets labelled, and it > *appears* to still come from trunk - but that does not matter, since it > is the same as the version on trunk. > > The svn copy operations are formed from this set of information by > copying the SVN revision of trunk that applied at the point the branch > was made, and then overriding the copy information for each file that > was then modified on the branch with information about that copy. This > is sufficient for svn to fully understand the history information for > each and every file in the tag. > > Unfortunately, git-svn mis-interprets this when building its graph of > what happened and while it copies the right *content* into the release > branch, it does not copy the right *history*. The SVN R operation > copies the history from named revision, not just the content. That's > the significant difference between the two. > > R >> IMO, from such history (absent external knowledge about better reparenting >> options) the best choice for parent branch is /trunk@39596, not >> /branches/gcc-3_0-branch at a random revision from the replaced files. >> >> Still, I see your point, and I will fix reparenting support. Whether GCC >> community opts to reparent or not reparent is a different topic.
I've added proper reparenting support to svn-git scripts, and gcc-reparent will be updated in a day or so. I've also added a few minor improvements and fixed things that Joseph pointed out in my conversion. Once gcc-reparent conversion is regenerated, I'll do another round of comparisons between it and whatever the latest reposurgeon version is. -- Maxim Kuvyrkov https://www.linaro.org >> -- >> Maxim Kuvyrkov >> https://www.linaro.org >> >> >>> git blame expr.c: >>> >>> ba0a9cb85431 (Richard Kenner 1992-03-03 23:34:57 +0000 396) >>> return temp; >>> ba0a9cb85431 (Richard Kenner 1992-03-03 23:34:57 +0000 397) >>> } >>> 5fbf0b0d5828 (no-author 2001-06-17 19:44:25 +0000 398) >>> /* Copy the address into a pseudo, so that the returned value >>> 5fbf0b0d5828 (no-author 2001-06-17 19:44:25 +0000 399) >>> remains correct across calls to emit_queue. */ >>> 5fbf0b0d5828 (no-author 2001-06-17 19:44:25 +0000 400) >>> XEXP (new, 0) = copy_to_reg (XEXP (new, 0)); >>> 59f26b7caad9 (Richard Kenner 1994-01-11 00:23:47 +0000 401) >>> return new; >>> >>> git log 5fbf0b0d5828 >>> commit 5fbf0b0d5828687914c1c18a83ff12c8627d5a70 (HEAD, tag: gcc_3_0_release) >>> Author: no-author <no-aut...@gcc.gnu.org> >>> Date: Sun Jun 17 19:44:25 2001 +0000 >>> >>> This commit was manufactured by cvs2svn to create tag >>> 'gcc_3_0_release'. >>> >>> while svn blame expr.c correctly shows: >>> >>> 386 kenner return temp; >>> 386 kenner } >>> 42209 bernds /* Copy the address into a pseudo, so that the >>> returned value >>> 42209 bernds remains correct across calls to emit_queue. */ >>> 42209 bernds XEXP (new, 0) = copy_to_reg (XEXP (new, 0)); >>> 6375 kenner return new; >>> >>> svn log -r42209 ^/ >>> ------------------------------------------------------------------------ >>> r42209 | bernds | 2001-05-17 18:07:08 +0100 (Thu, 17 May 2001) | 2 lines >>> >>> Fix queueing-related bugs >>> >>> In other words, svn can correctly track the files that were modified on the >>> release branch, while the git conversion looses that information, rolling >>> up all the diffs on the release branch into a single unattributed commit. >>> >>> As I said, gcc-reparent is better in this regard, but there are still >>> artefacts from conversion, such as incorrect merge records, that show up. >>> >>> R. >>> >>>> >>>>> >>>>> gcc-reparent is better, but many (most?) of the release tags are shown >>>>> as merge commits with a fake parent back to the gcc-3 branch point, >>>>> which is certainly not what happened when the tagging was done at that >>>>> time. >>>> >>>> I agree with you here. >>>> >>>>> >>>>> Both of these factually misrepresent the history at the time of the >>>>> release tag being made. >>>> >>>> Yes and no. Gcc-pretty repository mirrors SVN history. And regarding the >>>> need for reparenting -- we lived with current history for gcc-3 release >>>> tags for a long time. I would argue their continued brokenness is not a >>>> show-stopper. >>>> >>>> Looking at this from a different perspective, when I posted the initial >>>> svn-git scripts back in Summer, the community roughly agreed on a plan to >>>> 1. Convert entire SVN history to git. >>>> 2. Use the stock git history rewrite tools (git filter-branch) to fixup >>>> what we want, e.g., reparent tags and branches or set better >>>> author/committer entries. >>>> >>>> Gcc-pretty does (1) in entirety. >>>> >>>> For reparenting, I tried a 15min fix to my scripts to enable reparenting, >>>> which worked, but with artifacts like the merge commit from old and new >>>> parents. I will drop this and instead use tried-and-true "git >>>> filter-branch" to reparent those tags and branches, thus producing >>>> gcc-reparent from gcc-pretty. >>>> >>>>> >>>>> As for converting my script to work with your tools, I'm afraid I don't >>>>> have time to work on that right now. I'm still bogged down validating >>>>> the incorrect bug ids that the script has identified for some commits. >>>>> I'm making good progress (we're down to 160 unreviewed commits now), but >>>>> it is still going to take what time I have over the next week to >>>>> complete that task. >>>>> >>>>> Furthermore, there is no documentation on how your conversion scripts >>>>> work, so it is not possible for me to test any work I might do in order >>>>> to validate such changes. Not being able to run the script locally to >>>>> test change would be a non-starter. >>>>> >>>>> You are welcome, of course, to clone the script I have and attempt to >>>>> modify it yourself, it's reasonably well documented. The sources can be >>>>> found in esr's gcc-conversion repository here: >>>>> https://gitlab.com/esr/gcc-conversion.git >>>> >>>> -- >>>> Maxim Kuvyrkov >>>> https://www.linaro.org >>>> >>>>> >>>>> >>>>>> Meanwhile, I'm going to add additional root commits to my gcc-reparent >>>>>> conversion to bring in "missing" branches (the ones, which don't share >>>>>> history with trunk@1) and restart daily updates of gcc-reparent >>>>>> conversion. >>>>>> >>>>>> Finally, with the comparison data I have, I consider statements about >>>>>> git-svn's poor quality to be very misleading. Git-svn may have had >>>>>> serious bugs years ago when Eric R. evaluated it and started his work on >>>>>> reposurgeon. But a lot of development has happened and many problems >>>>>> have been fixed since them. At the moment it is reposurgeon that is >>>>>> producing conversions with obscure mistakes in repository metadata. >>>>>> >>>>>> >>>>>> === Missed merges === >>>>>> >>>>>> Reposurgeon misses merges from trunk on 130+ branches. I've >>>>>> spot-checked ARM/hard_vfp_branch and redhat/gcc-9-branch and, indeed, >>>>>> rather mundane merges were omitted. Below is analysis for >>>>>> ARM/hard_vfp_branch. >>>>>> >>>>>> $ git log --stat refs/remotes/gcc-reposurgeon-6a/ARM/hard_vfp_branch~4 >>>>>> ---- >>>>>> commit ef92c24b042965dfef982349cd5994a2e0ff5fde >>>>>> Author: Richard Earnshaw <rearn...@gcc.gnu.org> >>>>>> Date: Mon Jul 20 08:15:51 2009 +0000 >>>>>> >>>>>> Merge trunk through to r149768 >>>>>> >>>>>> Legacy-ID: 149804 >>>>>> >>>>>> COPYING.RUNTIME | 73 + >>>>>> ChangeLog | 270 +- >>>>>> MAINTAINERS | 19 +- >>>>>> <MANY OTHER FILES> >>>>>> ---- >>>>>> >>>>>> at the same time for svn-git scripts we have: >>>>>> >>>>>> $ git log --stat refs/remotes/gcc-reparent/ARM/hard_vfp_branch~4 >>>>>> ---- >>>>>> commit ce7d5c8df673a7a561c29f095869f20567a7c598 >>>>>> Merge: 4970119c20da 3a69b1e566a7 >>>>>> Author: Richard Earnshaw <rearn...@arm.com> >>>>>> Date: Mon Jul 20 08:15:51 2009 +0000 >>>>>> >>>>>> Merge trunk through to r149768 >>>>>> >>>>>> git-svn-id: >>>>>> https://gcc.gnu.org/svn/gcc/branches/ARM/hard_vfp_branch@149804 >>>>>> 138bc75d-0d04-0410-961f-82ee72b054a4 >>>>>> ---- >>>>>> >>>>>> ... which agrees with >>>>>> $ svn propget svn:mergeinfo >>>>>> file:///home/maxim.kuvyrkov/tmpfs-stuff/svnrepo/branches/ARM/hard_vfp_branch@149804 >>>>>> /trunk:142588-149768 >>>>>> >>>>>> === Bad author entries === >>>>>> >>>>>> Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and >>>>>> "2005-03-18 Kazu Hirata". It is rather obvious that person's name is >>>>>> unlikely to start with a digit. >>>>>> >>>>>> === Missed authors === >>>>>> >>>>>> Reposurgeon-6a conversion misses many authors, below is a list of people >>>>>> with names starting with "A". >>>>>> >>>>>> Akos Kiss >>>>>> Anders Bertelrud >>>>>> Andrew Pochinsky >>>>>> Anton Hartl >>>>>> Arthur Norman >>>>>> Aymeric Vincent >>>>>> >>>>>> === Conservative author entries === >>>>>> >>>>>> Reposurgeon-6a conversion uses default "@gcc.gnu.org" emails for many >>>>>> commits where svn-git conversion manages to extract valid email from >>>>>> commit data. This happens for hundreds of author entries. >>>>>> >>>>>> Regards, >>>>>> >>>>>> -- >>>>>> Maxim Kuvyrkov >>>>>> https://www.linaro.org >>>>>> >>>>>> >>>>>>> On Dec 26, 2019, at 7:11 PM, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>>> On Dec 26, 2019, at 2:16 PM, Jakub Jelinek <ja...@redhat.com> wrote: >>>>>>>> >>>>>>>> On Thu, Dec 26, 2019 at 11:04:29AM +0000, Joseph Myers wrote: >>>>>>>> Is there some easy way (e.g. file in the conversion scripts) to correct >>>>>>>> spelling and other mistakes in the commit authors? >>>>>>>> E.g. there are misspelled surnames, etc. (e.g. looking at my name, I >>>>>>>> see >>>>>>>> Jakub Jakub Jelinek (1): >>>>>>>> Jakub Jeilnek (1): >>>>>>>> Jelinek (1): >>>>>>>> entries next to the expected one with most of the commits. >>>>>>>> For the misspellings, wonder if e.g. we couldn't compute edit >>>>>>>> distances from >>>>>>>> other names and if we have one with many commits and then one with >>>>>>>> very few >>>>>>>> with small edit distance from those, flag it for human review. >>>>>>> >>>>>>> This is close to what svn-git-author.sh script is doing in gcc-pretty >>>>>>> and gcc-reparent conversions. It ignores 1-3 character differences in >>>>>>> author/committer names and email addresses. I've audited results for >>>>>>> all branches and didn't spot any mistakes. >>>>>>> >>>>>>> In other news, I'm working on comparison of gcc-pretty, gcc-reparent >>>>>>> and gcc-reposurgeon-5a repos among themselves. Below are current notes >>>>>>> for comparison of gcc-pretty/trunk and gcc-reposurgeon-5a/trunk. >>>>>>> >>>>>>> == Merges on trunk == >>>>>>> >>>>>>> Reposurgeon creates merge entries on trunk when changes from a branch >>>>>>> are merged into trunk. This brings entire development history from the >>>>>>> branch to trunk, which is both good and bad. The good part is that we >>>>>>> get more visibility into how the code evolved. The bad part is that we >>>>>>> get many "noisy" commits from merged branch (e.g., "Merge in trunk" >>>>>>> every few revisions) and that our SVN branches are work-in-progress >>>>>>> quality, not ready for review/commit quality. It's common for files to >>>>>>> be re-written in large chunks on branches. >>>>>>> >>>>>>> Also, reposurgeon's commit logs don't have information on SVN path from >>>>>>> which the change came, so there is no easy way to determine that a >>>>>>> given commit is from a merged branch, not an original trunk commit. >>>>>>> Git-svn, on the other hand, provides "git-svn-id: <path>@<revision>" >>>>>>> tags in its commit logs. >>>>>>> >>>>>>> My conversion follows current GCC development policy that trunk history >>>>>>> should be linear. Branch merges to trunk are squashed. Merges between >>>>>>> non-trunk branches are handled as specified by svn:mergeinfo SVN >>>>>>> properties. >>>>>>> >>>>>>> == Differences in trees == >>>>>>> >>>>>>> Git trees (aka filesystem content) match between pretty/trunk and >>>>>>> reposurgeon-5a/trunk from current tip and up tosvn's r130805. >>>>>>> Here is SVN log of that revision (restoration of deleted trunk): >>>>>>> ------------------------------------------------------------------------ >>>>>>> r130805 | dberlin | 2007-12-13 01:53:37 +0000 (Thu, 13 Dec 2007) >>>>>>> Changed paths: >>>>>>> A /trunk (from /trunk:130802) >>>>>>> ------------------------------------------------------------------------ >>>>>>> >>>>>>> Reposurgeon conversion has: >>>>>>> ------------- >>>>>>> commit 7e6f2a96e89d96c2418482788f94155d87791f0a >>>>>>> Author: Daniel Berlin <dber...@gcc.gnu.org> >>>>>>> Date: Thu Dec 13 01:53:37 2007 +0000 >>>>>>> >>>>>>> Readd trunk >>>>>>> >>>>>>> Legacy-ID: 130805 >>>>>>> >>>>>>> .gitignore | 17 ----------------- >>>>>>> 1 file changed, 17 deletions(-) >>>>>>> ------------- >>>>>>> and my conversion has: >>>>>>> ------------- >>>>>>> commit fb128f3970789ce094c798945b4fa20eceb84cc7 >>>>>>> Author: Daniel Berlin <dber...@dbrelin.org> >>>>>>> Date: Thu Dec 13 01:53:37 2007 +0000 >>>>>>> >>>>>>> Readd trunk >>>>>>> >>>>>>> >>>>>>> git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@130805 >>>>>>> 138bc75d-0d04-0410-961f-82ee72b054a4 >>>>>>> ------------- >>>>>>> >>>>>>> It appears that .gitignore has been added in r1 by reposurgeon and then >>>>>>> deleted at r130805. In SVN repository .gitignore was added in r195087. >>>>>>> I speculate that addition of .gitignore at r1 is expected, but it's >>>>>>> deletion at r130805 is highly suspicious. >>>>>>> >>>>>>> == Committer entries == >>>>>>> >>>>>>> Reposurgeon uses $u...@gcc.gnu.org for committer email addresses even >>>>>>> when it correctly detects author name from ChangeLog. >>>>>>> >>>>>>> reposurgeon-5a: >>>>>>> r278995 Martin Liska <mli...@suse.cz> Martin Liska <mar...@gcc.gnu.org> >>>>>>> r278994 Jozef Lawrynowicz <joze...@mittosystems.com> Jozef Lawrynowicz >>>>>>> <joz...@gcc.gnu.org> >>>>>>> r278993 Frederik Harwath <frede...@codesourcery.com> Frederik Harwath >>>>>>> <frede...@gcc.gnu.org> >>>>>>> r278992 Georg-Johann Lay <a...@gjlay.de> Georg-Johann Lay >>>>>>> <g...@gcc.gnu.org> >>>>>>> r278991 Richard Biener <rguent...@suse.de> Richard Biener >>>>>>> <rgue...@gcc.gnu.org> >>>>>>> >>>>>>> pretty: >>>>>>> r278995 Martin Liska <mli...@suse.cz> Martin Liska <mli...@suse.cz> >>>>>>> r278994 Jozef Lawrynowicz <joze...@mittosystems.com> Jozef Lawrynowicz >>>>>>> <joze...@mittosystems.com> >>>>>>> r278993 Frederik Harwath <frede...@codesourcery.com> Frederik Harwath >>>>>>> <frede...@codesourcery.com> >>>>>>> r278992 Georg-Johann Lay <a...@gjlay.de> Georg-Johann Lay >>>>>>> <a...@gjlay.de> >>>>>>> r278991 Richard Biener <rguent...@suse.de> Richard Biener >>>>>>> <rguent...@suse.de> >>>>>>> >>>>>>> == Bad summary line == >>>>>>> >>>>>>> While looking around r138087, below caught my eye. Is the contents of >>>>>>> summary line as expected? >>>>>>> >>>>>>> commit cc2726884d56995c514d8171cc4a03657851657e >>>>>>> Author: Chris Fairles <chris.fair...@gmail.com> >>>>>>> Date: Wed Jul 23 14:49:00 2008 +0000 >>>>>>> >>>>>>> acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS. >>>>>>> >>>>>>> 2008-07-23 Chris Fairles <chris.fair...@gmail.com> >>>>>>> >>>>>>> * acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define >>>>>>> GLIBCXX_LIBS. >>>>>>> Holds the lib that defines clock_gettime (-lrt or -lposix4). >>>>>>> * src/Makefile.am: Use it. >>>>>>> * configure: Regenerate. >>>>>>> * configure.in: Likewise. >>>>>>> * Makefile.in: Likewise. >>>>>>> * src/Makefile.in: Likewise. >>>>>>> * libsup++/Makefile.in: Likewise. >>>>>>> * po/Makefile.in: Likewise. >>>>>>> * doc/Makefile.in: Likewise. >>>>>>> >>>>>>> Legacy-ID: 138087 >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Maxim Kuvyrkov >>>>>>> https://www.linaro.org >> >