Re: Repository for the conversion machinery

Joseph Myers Mon, 10 Oct 2016 14:54:09 -0700

On Mon, 10 Oct 2016, Eric S. Raymond wrote:

> I strongly recomend that if you want to try this, you separate it from the
> initial repo conversion.  That is, get the project to git first.  Then
> see if you can data-mine author information out of the history. If,
> and only if, you get results that look reasonable, then you patch the repo
> and force-push it, warning everyone there'll be a flag day.
> 
> The reason I recommend this is that I think you're going to have serious
> trouble getting clean authorship data with good coverage.  The data
> mining will be messy and take longer than you expect.


I also think it would be too messy, and don't think having such a flag day 
would be a good idea - once we've done the conversion we should keep 
commit ids stable (while having the commit objects from the existing git 
mirror in a disjoint set of branches not connected to the cleanly 
converted history, whether in a separate repository or not, so existing 
references to those commit ids continue to work as well - but I don't want 
to add a third set of commit ids for the same history as well).

In practice there are a lot of ways people have messed up ChangeLog 
commits or commit messages that I would expect to confuse such author 
extraction, even before you get to the parts of the history converted from 
CVS.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: Repository for the conversion machinery

Reply via email to