Re: Proposal for the transition timetable for the move to GIT
> On Dec 30, 2019, at 3:18 AM, Joseph Myers wrote: > > On Sun, 29 Dec 2019, Richard Earnshaw (lists) wrote: > >> gcc-reparent is better, but many (most?) of the release tags are shown >> as merge commits with a fake parent back to the gcc-3 branch point, >> which is certainly not what happened when the tagging was done at that >> time. > > And looking at the history of gcc-reparent as part of preparing to compare > authors to identify commits needing manual attention to author > identification, I see other oddities. > > Do "git log egcs_1_1_2_prerelease_2" in gcc-reparent, for example. The > history ends up containing two different versions of SVN r5 and of many > other commits. One of them looks normal: > > commit c01d37f1690de9ea83b341780fad458f506b80c7 > Author: Charles Hannum > Date: Mon Nov 27 21:22:14 1989 + > >entered into RCS > > >git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@5 > 138bc75d-0d04-0410-961f-82ee72b054a4 > > The other looks strange: > > commit 09c5a0fa5ed76e58cc67f3d72bf397277fdd > Author: Charles Hannum > Date: Mon Nov 27 21:22:14 1989 + > >entered into RCS > > >git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@5 > 138bc75d-0d04-0410-961f-82ee72b054a4 >Updated tag 'egcs_1_1_2_prerelease_2@279090' (was bc80be265a0) >Updated tag 'egcs_1_1_2_prerelease_2@279154' (was f7cee65b219) >Updated tag 'egcs_1_1_2_prerelease_2@279213' (was 74dcba9b414) >Updated tag 'egcs_1_1_2_prerelease_2@279270' (was 7e63c9b344d) >Updated tag 'egcs_1_1_2_prerelease_2@279336' (was 47894371e3c) >Updated tag 'egcs_1_1_2_prerelease_2@279392' (was 3c3f6932316) >Updated tag 'egcs_1_1_2_prerelease_2@279402' (was 29d9998f523b) > > (and in fact it seems there are *four* commits corresponding to SVN r5 and > reachable from refs in the gcc-reparent repository). So we don't just > have stray merge commits, they actually end up leading back to strange > alternative versions of history (which I think is clearly worse than > conservatively not having a merge commit in some case where a commit might > or might not be unambiguously a merge - if a merge was missed on an active > branch, the branch maintainer can easily correct that afterwards with "git > merge -s ours" to avoid problems with future merges). > > My expectation is that there are only multiple git commits corresponding > to an SVN commit when the SVN commit touched more than one SVN branch or > tag and so has to be split to represent it in git (there are about 1500 > such SVN commits, most of them automatic datestamp updates in the CVS era > that cvs2svn turned into mixed-branch commits). Thanks for catching this. This is fallout from incremental rebuilds (rather than fresh builds) of gcc-reparent repository. Incremental builds take about 1h and full rebuilds take about 30h. I'll switch to doing full rebuilds. -- Maxim Kuvyrkov https://www.linaro.org
Re: Proposal for the transition timetable for the move to GIT
> On Dec 30, 2019, at 1:24 AM, Richard Earnshaw (lists) > wrote: > > On 29/12/2019 18:30, Maxim Kuvyrkov wrote: >> Below are several more issues I found in reposurgeon-6a conversion comparing >> it against gcc-reparent conversion. >> >> I am sure, these and whatever other problems I may find in the reposurgeon >> conversion can be fixed in time. However, I don't see why should bother. >> My conversion has been available since summer 2019, I made it ready in time >> for GCC Cauldron 2019, and it didn't change in any significant way since >> then. >> >> With the "Missed merges" problem (see below) I don't see how reposurgeon >> conversion can be considered "ready". Also, I expected a diligent developer >> to compare new conversion (aka reposurgeon's) against existing conversion >> (aka gcc-pretty / gcc-reparent) before declaring the new conversion "better" >> or even "ready". The data I'm seeing in differences between my and >> reposurgeon conversions shows that gcc-reparent conversion is /better/. >> >> I suggest that GCC community adopts either gcc-pretty or gcc-reparent >> conversion. I welcome Richard E. to modify his summary scripts to work with >> svn-git scripts, which should be straightforward, and I'm ready to help. >> > > I don't think either of these conversions are any more ready to use than > the reposurgeon one, possibly less so. In fact, there are still some > major issues to resolve first before they can be considered. > > gcc-pretty has completely wrong parent information for the gcc-3 era > release tags, showing the tags as being made directly from trunk with > massive deltas representing the roll-up of all the commits that were > made on the gcc-3 release branch. I will clarify the above statement, and please correct me where you think I'm wrong. Gcc-pretty conversion has the exact right parent information for the gcc-3 era release tags as recorded in SVN version history. Gcc-pretty conversion aims to produce an exact copy of SVN history in git. IMO, it manages to do so just fine. It is a different thing that SVN history has a screwed up record of gcc-3 era tags. > > gcc-reparent is better, but many (most?) of the release tags are shown > as merge commits with a fake parent back to the gcc-3 branch point, > which is certainly not what happened when the tagging was done at that > time. I agree with you here. > > Both of these factually misrepresent the history at the time of the > release tag being made. Yes and no. Gcc-pretty repository mirrors SVN history. And regarding the need for reparenting -- we lived with current history for gcc-3 release tags for a long time. I would argue their continued brokenness is not a show-stopper. Looking at this from a different perspective, when I posted the initial svn-git scripts back in Summer, the community roughly agreed on a plan to 1. Convert entire SVN history to git. 2. Use the stock git history rewrite tools (git filter-branch) to fixup what we want, e.g., reparent tags and branches or set better author/committer entries. Gcc-pretty does (1) in entirety. For reparenting, I tried a 15min fix to my scripts to enable reparenting, which worked, but with artifacts like the merge commit from old and new parents. I will drop this and instead use tried-and-true "git filter-branch" to reparent those tags and branches, thus producing gcc-reparent from gcc-pretty. > > As for converting my script to work with your tools, I'm afraid I don't > have time to work on that right now. I'm still bogged down validating > the incorrect bug ids that the script has identified for some commits. > I'm making good progress (we're down to 160 unreviewed commits now), but > it is still going to take what time I have over the next week to > complete that task. > > Furthermore, there is no documentation on how your conversion scripts > work, so it is not possible for me to test any work I might do in order > to validate such changes. Not being able to run the script locally to > test change would be a non-starter. > > You are welcome, of course, to clone the script I have and attempt to > modify it yourself, it's reasonably well documented. The sources can be > found in esr's gcc-conversion repository here: > https://gitlab.com/esr/gcc-conversion.git -- Maxim Kuvyrkov https://www.linaro.org > > >> Meanwhile, I'm going to add additional root commits to my gcc-reparent >> conversion to bring in "missing" branches (the ones, which don't share >> history with trunk@1) and restart daily updates of gcc-reparent conversion. >> >> Finally, with the comparison data I have, I consider statements about >> git-svn's poor quality to be very misleading. Git-svn may have had serious >> bugs years ago when Eric R. evaluated it and started his work on >> reposurgeon. But a lot of development has happened and many problems have >> been fixed since them. At the moment it is reposurgeon that is producing >> conversion
Re: Proposal for the transition timetable for the move to GIT
On 30/12/2019 13:00, Maxim Kuvyrkov wrote: >> On Dec 30, 2019, at 1:24 AM, Richard Earnshaw (lists) >> wrote: >> >> On 29/12/2019 18:30, Maxim Kuvyrkov wrote: >>> Below are several more issues I found in reposurgeon-6a conversion >>> comparing it against gcc-reparent conversion. >>> >>> I am sure, these and whatever other problems I may find in the reposurgeon >>> conversion can be fixed in time. However, I don't see why should bother. >>> My conversion has been available since summer 2019, I made it ready in time >>> for GCC Cauldron 2019, and it didn't change in any significant way since >>> then. >>> >>> With the "Missed merges" problem (see below) I don't see how reposurgeon >>> conversion can be considered "ready". Also, I expected a diligent >>> developer to compare new conversion (aka reposurgeon's) against existing >>> conversion (aka gcc-pretty / gcc-reparent) before declaring the new >>> conversion "better" or even "ready". The data I'm seeing in differences >>> between my and reposurgeon conversions shows that gcc-reparent conversion >>> is /better/. >>> >>> I suggest that GCC community adopts either gcc-pretty or gcc-reparent >>> conversion. I welcome Richard E. to modify his summary scripts to work >>> with svn-git scripts, which should be straightforward, and I'm ready to >>> help. >>> >> >> I don't think either of these conversions are any more ready to use than >> the reposurgeon one, possibly less so. In fact, there are still some >> major issues to resolve first before they can be considered. >> >> gcc-pretty has completely wrong parent information for the gcc-3 era >> release tags, showing the tags as being made directly from trunk with >> massive deltas representing the roll-up of all the commits that were >> made on the gcc-3 release branch. > > I will clarify the above statement, and please correct me where you think I'm > wrong. Gcc-pretty conversion has the exact right parent information for the > gcc-3 era > release tags as recorded in SVN version history. Gcc-pretty conversion aims > to produce an exact copy of SVN history in git. IMO, it manages to do so > just fine. > > It is a different thing that SVN history has a screwed up record of gcc-3 era > tags. It's not screwed up in svn. Svn shows the correct history information for the gcc-3 era release tags, but the git-svn conversion in gcc-pretty does not. For example, looking at gcc_3_0_release in expr.c with git blame and svn blame shows git blame expr.c: ba0a9cb85431 (Richard Kenner 1992-03-03 23:34:57 + 396) return temp; ba0a9cb85431 (Richard Kenner 1992-03-03 23:34:57 + 397) } 5fbf0b0d5828 (no-author 2001-06-17 19:44:25 + 398) /* Copy the address into a pseudo, so that the returned value 5fbf0b0d5828 (no-author 2001-06-17 19:44:25 + 399) remains correct across calls to emit_queue. */ 5fbf0b0d5828 (no-author 2001-06-17 19:44:25 + 400) XEXP (new, 0) = copy_to_reg (XEXP (new, 0)); 59f26b7caad9 (Richard Kenner 1994-01-11 00:23:47 + 401) return new; git log 5fbf0b0d5828 commit 5fbf0b0d5828687914c1c18a83ff12c8627d5a70 (HEAD, tag: gcc_3_0_release) Author: no-author Date: Sun Jun 17 19:44:25 2001 + This commit was manufactured by cvs2svn to create tag 'gcc_3_0_release'. while svn blame expr.c correctly shows: 386 kenner return temp; 386 kenner } 42209 bernds /* Copy the address into a pseudo, so that the returned value 42209 berndsremains correct across calls to emit_queue. */ 42209 bernds XEXP (new, 0) = copy_to_reg (XEXP (new, 0)); 6375 kenner return new; svn log -r42209 ^/ r42209 | bernds | 2001-05-17 18:07:08 +0100 (Thu, 17 May 2001) | 2 lines Fix queueing-related bugs In other words, svn can correctly track the files that were modified on the release branch, while the git conversion looses that information, rolling up all the diffs on the release branch into a single unattributed commit. As I said, gcc-reparent is better in this regard, but there are still artefacts from conversion, such as incorrect merge records, that show up. R. > >> >> gcc-reparent is better, but many (most?) of the release tags are shown >> as merge commits with a fake parent back to the gcc-3 branch point, >> which is certainly not what happened when the tagging was done at that >> time. > > I agree with you here. > >> >> Both of these factually misrepresent the history at the time of the >> release tag being made. > > Yes and no. Gcc-pretty repository mirrors SVN history. And regarding the > need for reparenting -- we lived with current history for gcc-3 release tags > for a long time. I would argue their continued brokenness is not a > show-stopper. > > Looking at this from a d
Re: Proposal for the transition timetable for the move to GIT
On 29/12/2019 23:13, Segher Boessenkool wrote: > On Sun, Dec 29, 2019 at 11:00:08PM +, Joseph Myers wrote: >> fixups in bugdb.py - and that way benefit both from reposurgeon making >> choices that are as conservatively safe as possible, which seems a >> desirable property for problem cases that haven't been manually reviewed, > > Problem cases that haven't been manually reviewed should *be* manually > reviewed, or the heuristics improved so there are fewer problem cases. > Thank you for offering to help with the checking. ;-) R. > As I've said many many times now, we only have *one* repository to > convert here. Taking shortcuts is *good*, making problems for ourselves > by pretending we do things more generically is *bad*. > > > Segher >
Re: Git conversion: fixing email addresses from ChangeLog files
On 29/12/2019 22:56, Eric S. Raymond wrote: > Richard Earnshaw (lists) : >> Weak in the sense that it isn't proof given that the user name is >> partially redacted. There's nothing in the gcc archives that gives a >> full name either, unfortunately. >> >> Yes, it's the most likely match, but there's still an element of doubt. >> >> R. > > https://groups.google.com/forum/#!msg/comp.databases.sybase/Uz8ICef9Qr8/uPwanH6is60 > > If you open his message to Michel Peppler, you'll see a sig block that > says: > > bjo...@planetarion.com Bjørn Wennberg, Fifth Season AS > > It's him, yep. Be sure to get the ø right what you fill in the name. :-) > Excellent. Then as you say, we have a match. R.
Re: Proposal for the transition timetable for the move to GIT
> On Dec 30, 2019, at 6:31 PM, Richard Earnshaw (lists) > wrote: > > On 30/12/2019 13:00, Maxim Kuvyrkov wrote: >>> On Dec 30, 2019, at 1:24 AM, Richard Earnshaw (lists) >>> wrote: >>> >>> On 29/12/2019 18:30, Maxim Kuvyrkov wrote: Below are several more issues I found in reposurgeon-6a conversion comparing it against gcc-reparent conversion. I am sure, these and whatever other problems I may find in the reposurgeon conversion can be fixed in time. However, I don't see why should bother. My conversion has been available since summer 2019, I made it ready in time for GCC Cauldron 2019, and it didn't change in any significant way since then. With the "Missed merges" problem (see below) I don't see how reposurgeon conversion can be considered "ready". Also, I expected a diligent developer to compare new conversion (aka reposurgeon's) against existing conversion (aka gcc-pretty / gcc-reparent) before declaring the new conversion "better" or even "ready". The data I'm seeing in differences between my and reposurgeon conversions shows that gcc-reparent conversion is /better/. I suggest that GCC community adopts either gcc-pretty or gcc-reparent conversion. I welcome Richard E. to modify his summary scripts to work with svn-git scripts, which should be straightforward, and I'm ready to help. >>> >>> I don't think either of these conversions are any more ready to use than >>> the reposurgeon one, possibly less so. In fact, there are still some >>> major issues to resolve first before they can be considered. >>> >>> gcc-pretty has completely wrong parent information for the gcc-3 era >>> release tags, showing the tags as being made directly from trunk with >>> massive deltas representing the roll-up of all the commits that were >>> made on the gcc-3 release branch. >> >> I will clarify the above statement, and please correct me where you think >> I'm wrong. Gcc-pretty conversion has the exact right parent information for >> the gcc-3 era >> release tags as recorded in SVN version history. Gcc-pretty conversion aims >> to produce an exact copy of SVN history in git. IMO, it manages to do so >> just fine. >> >> It is a different thing that SVN history has a screwed up record of gcc-3 >> era tags. > > It's not screwed up in svn. Svn shows the correct history information for > the gcc-3 era release tags, but the git-svn conversion in gcc-pretty does not. > > For example, looking at gcc_3_0_release in expr.c with git blame and svn > blame shows In SVN history tags/gcc_3_0_release has been copied off /trunk:39596 and in the same commit bunch of files were replaced from /branches/gcc-3_0-branch/ (and from different revisions of this branch!). $ svn log -qv --stop-on-copy file://$(pwd)/tags/gcc_3_0_release | grep "/tags/gcc_3_0_release \|/tags/gcc_3_0_release/gcc/expr.c \|/tags/gcc_3_0_release/gcc/reload.c " A /tags/gcc_3_0_release (from /trunk:39596) R /tags/gcc_3_0_release/gcc/expr.c (from /branches/gcc-3_0-branch/gcc/expr.c:43255) R /tags/gcc_3_0_release/gcc/reload.c (from /branches/gcc-3_0-branch/gcc/reload.c:42007) IMO, from such history (absent external knowledge about better reparenting options) the best choice for parent branch is /trunk@39596, not /branches/gcc-3_0-branch at a random revision from the replaced files. Still, I see your point, and I will fix reparenting support. Whether GCC community opts to reparent or not reparent is a different topic. -- Maxim Kuvyrkov https://www.linaro.org > git blame expr.c: > > ba0a9cb85431 (Richard Kenner 1992-03-03 23:34:57 + 396) > return temp; > ba0a9cb85431 (Richard Kenner 1992-03-03 23:34:57 + 397) } > 5fbf0b0d5828 (no-author 2001-06-17 19:44:25 + 398) /* > Copy the address into a pseudo, so that the returned value > 5fbf0b0d5828 (no-author 2001-06-17 19:44:25 + 399) > remains correct across calls to emit_queue. */ > 5fbf0b0d5828 (no-author 2001-06-17 19:44:25 + 400) > XEXP (new, 0) = copy_to_reg (XEXP (new, 0)); > 59f26b7caad9 (Richard Kenner 1994-01-11 00:23:47 + 401) > return new; > > git log 5fbf0b0d5828 > commit 5fbf0b0d5828687914c1c18a83ff12c8627d5a70 (HEAD, tag: gcc_3_0_release) > Author: no-author > Date: Sun Jun 17 19:44:25 2001 + > >This commit was manufactured by cvs2svn to create tag >'gcc_3_0_release'. > > while svn blame expr.c correctly shows: > > 386 kenner return temp; > 386 kenner } > 42209 bernds /* Copy the address into a pseudo, so that the > returned value > 42209 berndsremains correct across calls to emit_queue. */ > 42209 bernds XEXP (new, 0) = copy_to_reg (XEXP (new, 0)); > 6375 kenner return new; > > svn log -r42209 ^/ >
Re: Proposal for the transition timetable for the move to GIT
On 30/12/2019 15:49, Maxim Kuvyrkov wrote: >> On Dec 30, 2019, at 6:31 PM, Richard Earnshaw (lists) >> wrote: >> >> On 30/12/2019 13:00, Maxim Kuvyrkov wrote: On Dec 30, 2019, at 1:24 AM, Richard Earnshaw (lists) wrote: On 29/12/2019 18:30, Maxim Kuvyrkov wrote: > Below are several more issues I found in reposurgeon-6a conversion > comparing it against gcc-reparent conversion. > > I am sure, these and whatever other problems I may find in the > reposurgeon conversion can be fixed in time. However, I don't see why > should bother. My conversion has been available since summer 2019, I > made it ready in time for GCC Cauldron 2019, and it didn't change in any > significant way since then. > > With the "Missed merges" problem (see below) I don't see how reposurgeon > conversion can be considered "ready". Also, I expected a diligent > developer to compare new conversion (aka reposurgeon's) against existing > conversion (aka gcc-pretty / gcc-reparent) before declaring the new > conversion "better" or even "ready". The data I'm seeing in differences > between my and reposurgeon conversions shows that gcc-reparent conversion > is /better/. > > I suggest that GCC community adopts either gcc-pretty or gcc-reparent > conversion. I welcome Richard E. to modify his summary scripts to work > with svn-git scripts, which should be straightforward, and I'm ready to > help. > I don't think either of these conversions are any more ready to use than the reposurgeon one, possibly less so. In fact, there are still some major issues to resolve first before they can be considered. gcc-pretty has completely wrong parent information for the gcc-3 era release tags, showing the tags as being made directly from trunk with massive deltas representing the roll-up of all the commits that were made on the gcc-3 release branch. >>> >>> I will clarify the above statement, and please correct me where you think >>> I'm wrong. Gcc-pretty conversion has the exact right parent information >>> for the gcc-3 era >>> release tags as recorded in SVN version history. Gcc-pretty conversion >>> aims to produce an exact copy of SVN history in git. IMO, it manages to do >>> so just fine. >>> >>> It is a different thing that SVN history has a screwed up record of gcc-3 >>> era tags. >> >> It's not screwed up in svn. Svn shows the correct history information for >> the gcc-3 era release tags, but the git-svn conversion in gcc-pretty does >> not. >> >> For example, looking at gcc_3_0_release in expr.c with git blame and svn >> blame shows > > In SVN history tags/gcc_3_0_release has been copied off /trunk:39596 and in > the same commit bunch of files were replaced from /branches/gcc-3_0-branch/ > (and from different revisions of this branch!). > > $ svn log -qv --stop-on-copy file://$(pwd)/tags/gcc_3_0_release | grep > "/tags/gcc_3_0_release \|/tags/gcc_3_0_release/gcc/expr.c > \|/tags/gcc_3_0_release/gcc/reload.c " >A /tags/gcc_3_0_release (from /trunk:39596) >R /tags/gcc_3_0_release/gcc/expr.c (from > /branches/gcc-3_0-branch/gcc/expr.c:43255) >R /tags/gcc_3_0_release/gcc/reload.c (from > /branches/gcc-3_0-branch/gcc/reload.c:42007) > Right, (and wrong). You have to understand how the release branches and tags are represented in CVS to understand why the SVN conversion is done this way. When a branch was created in CVS a tag was added to each commit which would then be used in any future revisions along that branch. But until a commit is made on that branch, the release branch is just a placeholder. When a CVS release tag is created, the tag labels the relevant commit that is to be used. If that commit is unchanged from the trunk revision (no commit on the branch), then that is what gets labelled, and it *appears* to still come from trunk - but that does not matter, since it is the same as the version on trunk. The svn copy operations are formed from this set of information by copying the SVN revision of trunk that applied at the point the branch was made, and then overriding the copy information for each file that was then modified on the branch with information about that copy. This is sufficient for svn to fully understand the history information for each and every file in the tag. Unfortunately, git-svn mis-interprets this when building its graph of what happened and while it copies the right *content* into the release branch, it does not copy the right *history*. The SVN R operation copies the history from named revision, not just the content. That's the significant difference between the two. R > IMO, from such history (absent external knowledge about better reparenting > options) the best choice for parent branch is /trunk@39596, not > /branches/gcc-3_0-branch at a random revision from the replaced files. > > Still, I see your p
Re: Proposal for the transition timetable for the move to GIT
On Mon, Dec 30, 2019 at 03:36:42PM +, Richard Earnshaw (lists) wrote: > On 29/12/2019 23:13, Segher Boessenkool wrote: > > On Sun, Dec 29, 2019 at 11:00:08PM +, Joseph Myers wrote: > >> fixups in bugdb.py - and that way benefit both from reposurgeon making > >> choices that are as conservatively safe as possible, which seems a > >> desirable property for problem cases that haven't been manually reviewed, > > > > Problem cases that haven't been manually reviewed should *be* manually > > reviewed, or the heuristics improved so there are fewer problem cases. > > > > Thank you for offering to help with the checking. > > ;-) I am telling you what you (imo) need to do at a minimum to make your candidate conversion acceptable, if it has the problems you say it has. To make it not be super much work, I'd do the second option: better heuristics. Those in Maxim's conversion have been great since over half a year, you could borrow some, or peek for inspiration? I have no interest in improving another candidate conversion, as I'm sure you realise. And I'm supposed to have time off now ;-) If you guys want to ever finish, you'll need to drop the quest for perfection, because this leads to a) much more work, and b) worse quality in the end. And before you protest, please look at the evidence again. *Your own* evidence. HTH, this is supposed to be constructive, not a flame, Best wishes, Segher
Re: Proposal for the transition timetable for the move to GIT
On Mon, 30 Dec 2019, Segher Boessenkool wrote: > To make it not be super much work, I'd do the second option: better > heuristics. Those in Maxim's conversion have been great since over half > a year, you could borrow some, or peek for inspiration? Actually, comparing authors between the two conversions shows plenty of places where the more aggressive ChangeLog extraction in Maxim's conversion has produced less good attributions than reposurgeon (e.g. attributing merges to some random author from a ChangeLog modified in the merge, rather than to the committer of the merge, or attributing fixes in a ChangeLog to the author of a random entry that got fixed), as well as places where it's simply failed to extract an author from a ChangeLog that reposurgeon has extracted. So for "great", read "have some good ideas to learn from, but plenty of places with problems as well". I'm working on more detailed comparison of authors with some more heuristics to help identify the most interesting cases for manual inspection (those where it's more likely Maxim's heuristics are finding valid authors reposurgeon didn't) and separate those from cases where different subjective choices were made (e.g. of how to assign an author when one person backports another's patch, or multi-author commits where one conversion chose one author as the main one and the other conversion chose the other author). > If you guys want to ever finish, you'll need to drop the quest for > perfection, because this leads to a) much more work, and b) worse quality > in the end. To me, that indicates that using a conversion tool that is conservative in its heuristics, and then selectively applying improvements to the extent they can be done safely with manual review in a reasonable time, is better than applying a conversion tool with more aggressive heuristics. The issues with the reposurgeon conversion listed in Maxim's last comments were of the form "reposurgeon is being conservative in how it generates metadata from SVN information". I think that's a very good basis for adding on a limited set of safe improvements to authors and commit messages that can be done reasonably soon and then doing the final conversion with reposurgeon. -- Joseph S. Myers jos...@codesourcery.com
Re: Proposal for the transition timetable for the move to GIT
On Mon, Dec 30, 2019 at 10:58:05PM +, Joseph Myers wrote: > > If you guys want to ever finish, you'll need to drop the quest for > > perfection, because this leads to a) much more work, and b) worse quality > > in the end. > > To me, that indicates that using a conversion tool that is conservative in > its heuristics, and then selectively applying improvements to the extent > they can be done safely with manual review in a reasonable time, is better > than applying a conversion tool with more aggressive heuristics. Then you need to just completely drop this, and always use , because a large percentage will get that anyway then. Which is fine with me, fwiw: it's correct, and it's a little inconvenient perhaps, but it doesn't really make the result less usable at all. Precisely like weird merges on svn tags that aren't even on a branch. Perfect is the enemy of ever getting a conversion done. > The issues with the reposurgeon conversion listed in Maxim's last comments > were of the form "reposurgeon is being conservative in how it generates > metadata from SVN information". I think that's a very good basis for > adding on a limited set of safe improvements to authors and commit > messages that can be done reasonably soon and then doing the final > conversion with reposurgeon. No, we want to *see* why it would be better than the alternatives, what the differences are. Segher
Re: Proposal for the transition timetable for the move to GIT
Joseph Myers : > To me, that indicates that using a conversion tool that is conservative in > its heuristics, and then selectively applying improvements to the extent > they can be done safely with manual review in a reasonable time, is better > than applying a conversion tool with more aggressive heuristics. There's a more general point here, which I'm developing in my book-in-progress. Clean data-conversion problems can be done algorithmically without a human in the loop. Messy data-conversion problems need judgment amplifiers. Maxim's scripts try to treat a messy conversion problem as though it were a clean one. Maxim is pretty sharp, so this almost works. Almost. But the failure mode is predictable - overinterpreting badly-formed input leads to plausible garbage on output. When this happens, it's the Goddess Eris's way of telling you that there needs to be human judgment in the loop. Instead of trying to automate it out, you should be building tools that partion the process into things a computer does well, driven by choices a human makes well. This is a point that needs making because programmers thrown at messy conversion problems tend to be more fixated on achieving full automation than they perhaps ought to be. Elswhere I have written of Zeno tarpits: http://esr.ibiblio.org/?p=6772 Subversion dump streams are not quite a Zeno tarpit - they actually obey something that has the effect of a formal specification - but ChangeLog parsing is. > The issues with the reposurgeon conversion listed in Maxim's last comments > were of the form "reposurgeon is being conservative in how it generates > metadata from SVN information". I think that's a very good basis for > adding on a limited set of safe improvements to authors and commit > messages that can be done reasonably soon and then doing the final > conversion with reposurgeon. The flip side of this is that Joseph has been making intelligent and realistic suggestions for how to improve reposurgeon. That is *invaluable* - it captures knowledge that will make future comparisons easier and better. Software engineers (outside of a few AI specialists) don't ordinarily think of themselves as being in the knowledge-capture business. But it's a useful perspective to cultivate. -- http://www.catb.org/~esr/";>Eric S. Raymond