On Fri, 17 May 2019, Richard Sandiford wrote:

> We're not starting from scratch on that though.  The public git
> (semi-)mirror has been going for a long time, so IMO we should just
> inherit the policies for that.  (Like you say, forced pushed are
> restricted to the user namespace.)  Policies can evoluve over time :-)

It doesn't send anything to gcc-cvs or to Bugzilla, so we need to define 
what goes there, for example, and implement it (presumably as far as 
possible by configuring one of the sets of git hooks already in use on 
sourceware, e.g. the AdaCore ones used for binutils/gdb and glibc, rather 
than writing our own from scratch).  (When referring to commit messages I 
was thinking about the messages on gcc-cvs rather than the messages 
written by committers; I agree that the format of the latter is 
independent of a move to git, and have been using git-style messages for 
commits to GCC for some time.)

> But the discussion upthread seemed to be that having the very old stuff
> in git wasn't necessarily that important anyway.

I think git should have all the branches that haven't been deleted in SVN, 
minus any where there is a specific decision to remove in the conversion 
(messed up history, branch was an artefact of conversion from CVS rather 
than a real branch, etc.).  If a branch or tag has been deleted in SVN it 
should not be brought across to the git repository (SVN will remain 
readonly, just as the old CVS repository remains available readonly).

> FWIW, I've been using the "official" git-svn based mirror for at least
> the last five years, only using SVN to actually commit.  And I've never
> needed any of the above during that time.

That the git-svn mirror is useful for many purposes for which people want 
to use git also provides a clear argument against needing to do the final 
conversion in a hurry; people can use it when convenient while we take the 
time to get the conversion right (in particular, seeing what the Go 
conversion of reposurgeon comes up with), and then rebase their git 
branches on the final converted history.

(As previously noted I expect the objects from the git-svn mirror should 
go in the new repository with the refs appropriately renamed, so that old 
commit hash references remain valid and people don't need to check out a 
separate repository to access old git branches, which should be doable 
with a single "git fetch" command; the two versions of the history would 
be disconnected, but most blob and tree objects would have the same hashes 
so this shouldn't enlarge the repository much.  Rebasing on top of the 
final conversion, for active branches currently git-only, would be 
preferred to anything that connects the two versions of the history.)

> E.g. having proper author names seems like a nice-to-have rather than
> a requirement.  A lot of the effort spent on compiling that list seemed
> to be getting names and email addresses for people who haven't contributed
> to gcc for a long time (in some cases 20 years or more).  It's interesting
> historical data, but in almost all cases, the email addresses used are
> going to be defunct anyway.

I think having author names and email addresses is a basic requirement of 
any reasonable repository conversion - it's simply how git identifies 
authors; having something that is not a name and email for the author / 
committer there is not a proper use of git datastructures.  For me, that 
means that, when the author and committer are the same, some name and 
email address for the author that are or were valid at some point should 
be listed for both those fields in git.

I'm not particularly concerned with distinguishing between different names 
and email addresses for an author depending on when or in what capacity 
they contributed a change, or with the cases where a patch was committed 
for someone else and SVN simply doesn't provide a way to distinguish that 
information.  However, since some people were concerned with that, and 
since the feature needed for that was implemented (the "changelogs" 
feature in reposurgeon, which will do it as long as a proper ChangeLog 
entry was included in the commit), we may as well use that feature.  (The 
author map is still needed for commits without ChangeLog entries.)

> The big advantage of Maxim's approach is that it's a public script in
> our own repo that anyone can contribute to.  So if there are specific
> tweaks people want to make, there's now the opportunity to do that.

reposurgeon is public code in its own repository.  So now is the 
conversion machinery using it.

-- 
Joseph S. Myers
jos...@codesourcery.com

Reply via email to