On Thu, 26 Dec 2019, Maxim Kuvyrkov wrote:

> Reposurgeon creates merge entries on trunk when changes from a branch 
> are merged into trunk.  This brings entire development history from the 
> branch to trunk, which is both good and bad.  The good part is that we 
> get more visibility into how the code evolved.  The bad part is that we 
> get many "noisy" commits from merged branch (e.g., "Merge in trunk" 
> every few revisions) and that our SVN branches are work-in-progress 
> quality, not ready for review/commit quality.  It's common for files to 
> be re-written in large chunks on branches.

Seeing "noisy" or possibly confusing commits in "git log" output for 
master is simply a consequence of the possibly confusing defaults for how 
git log behaves (showing all commits in the ancestry in reverse committer 
date order).  I often find "git log --first-parent" output less confusing 
when dealing with any git repository making heavy use of branches (but 
there are other options as well to control how it shows such histories).

If we don't want merge commits on git master for the cases where people 
put merge properties on trunk in the past, we can use a reposurgeon 
"unmerge" command in gcc.lift to stop the few commits in question from 
being merge commits (while keeping all other merges as-is).  (The merges 
of trunk into other branches that copied merge properties from trunk into 
those branches will still be handled correctly, with exactly two parents 
rather than regaining the extra parents corresponding to the merges into 
trunk that Bernd noted in an earlier version of the conversion, because 
the processing that avoids redundant merge parents takes place well before 
any unmerge commands are executed - so at the time of that processing, 
reposurgeon knows that those other branches are in fact in the ancestry of 
trunk, even if we remove that information in the final git repository.)

> Also, reposurgeon's commit logs don't have information on SVN path from 
> which the change came, so there is no easy way to determine that a given 
> commit is from a merged branch, not an original trunk commit.  Git-svn, 

I think it's idiomatic in git for a branch commit not to say "this is a 
commit on X branch", i.e. this is a general property of branchy git 
histories (and unmerge is the solution if we don't want a branchy history 
of master, or use of smarter git tools for viewing the history that people 
may well make more use of when dealing with repositories with that kind of 
history).

> It appears that .gitignore has been added in r1 by reposurgeon and then 
> deleted at r130805.  In SVN repository .gitignore was added in r195087.  
> I speculate that addition of .gitignore at r1 is expected, but it's 
> deletion at r130805 is highly suspicious.

I suspect this is one of the known issues related to reposurgeon-generated 
.gitignore files.  Since such files are not really part of the GCC 
history, and the .gitignore files checked into SVN are properly preserved 
as far as I can see, I don't think it's a particularly important issue for 
the GCC conversion (since auto-generated .gitignore files are only 
nice-to-have, not required).  I've filed 
https://gitlab.com/esr/reposurgeon/issues/219 anyway with a reduced test 
for this oddity.

> Reposurgeon uses $u...@gcc.gnu.org for committer email addresses even 
> when it correctly detects author name from ChangeLog.

I think that's logically accurate (and certainly harmless) as a 
description of commits made to a central repository on gcc.gnu.org, 
although using committer = author would also be OK.

> == Bad summary line ==
> 
> While looking around r138087, below caught my eye.  Is the contents of 
> summary line as expected?
> 
> commit cc2726884d56995c514d8171cc4a03657851657e
> Author: Chris Fairles <chris.fair...@gmail.com>
> Date:   Wed Jul 23 14:49:00 2008 +0000
> 
>     acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.

Yes.  This seems to be Richard's script working exactly as intended, by 
extracting the first bit of the ChangeLog entry *after* the date/author 
header as a better description than "2008-07-23 Chris Fairles 
<chris.fair...@gmail.com>" (i.e. it certainly gives more distinctive 
information about the commit and is more useful than having a date/author 
line as the summary line).  I don't think it's a bad summary line (but 
Richard's script supports hardcoding new summary lines for individual 
commits where desired).

-- 
Joseph S. Myers
j...@polyomino.org.uk

Reply via email to