Re: Repository for the conversion machinery

Richard Earnshaw Thu, 17 Sep 2015 09:00:18 -0700

On 17/09/15 16:15, Michael Matz wrote:
> Hi,
> 
> On Thu, 17 Sep 2015, Eric S. Raymond wrote:
> 
>> All I can say is every time I've tried this it's been a nightmare, and 
>> when you say "apart from CVS imported revisions" my hair stands on end.  
>> And the GCC history is two and a half times the size of the next largest 
>> repo I've tried this on.
>>
>> If you want to try writing the program to do this data analysis, go 
>> right ahead.
> 
> A start would be:
> svn diff -c50004 | sed -ne \
> '/^+++.*ChangeLog/,/^Index/s/^+.*[0-9] *\([^0-9]*[(<].*@.*[)>]\).*$/\1/p'
> 
> Sometimes (e.g. for some CVS imported commits) the commit to ChangeLog 
> files was done in a different revision than the changes themself (it 
> wasn't a very good CVS to subversion conversion), so for that the above 
> doesn't find the address (it will be the revision before or after that 
> touches ChangeLog, but no other files).  But it's fairly reasonable for 
> newer revisions.  Might need adjustments for even different date or email 
> address formats.  Feeding it all revisions when you have extracted them 
> already should give a resonable estimate for who the real author was.
> 
> 
> Ciao,
> Michael.
>


None of this has any chance of working for any commits to the pre-egcs
sources.  In those days there was no version control on the ChangeLog file.

My feeling is we could spend months ratholing on this particular problem
rather than making real progress on moving forward.  If it will help to
move things forward, I'm happy to accept that for the purposes of
conversion we should just use 'committer id' and drop any attempt to
reconstruct 'author id' for each patch.

R.

Re: Repository for the conversion machinery

Reply via email to