On Mon, 10 Oct 2016, Eric S. Raymond wrote: > I strongly recomend that if you want to try this, you separate it from the > initial repo conversion. That is, get the project to git first. Then > see if you can data-mine author information out of the history. If, > and only if, you get results that look reasonable, then you patch the repo > and force-push it, warning everyone there'll be a flag day. > > The reason I recommend this is that I think you're going to have serious > trouble getting clean authorship data with good coverage. The data > mining will be messy and take longer than you expect.
I also think it would be too messy, and don't think having such a flag day would be a good idea - once we've done the conversion we should keep commit ids stable (while having the commit objects from the existing git mirror in a disjoint set of branches not connected to the cleanly converted history, whether in a separate repository or not, so existing references to those commit ids continue to work as well - but I don't want to add a third set of commit ids for the same history as well). In practice there are a lot of ways people have messed up ChangeLog commits or commit messages that I would expect to confuse such author extraction, even before you get to the parts of the history converted from CVS. -- Joseph S. Myers jos...@codesourcery.com