On Mon, 30 Dec 2019, Segher Boessenkool wrote: > To make it not be super much work, I'd do the second option: better > heuristics. Those in Maxim's conversion have been great since over half > a year, you could borrow some, or peek for inspiration?
Actually, comparing authors between the two conversions shows plenty of places where the more aggressive ChangeLog extraction in Maxim's conversion has produced less good attributions than reposurgeon (e.g. attributing merges to some random author from a ChangeLog modified in the merge, rather than to the committer of the merge, or attributing fixes in a ChangeLog to the author of a random entry that got fixed), as well as places where it's simply failed to extract an author from a ChangeLog that reposurgeon has extracted. So for "great", read "have some good ideas to learn from, but plenty of places with problems as well". I'm working on more detailed comparison of authors with some more heuristics to help identify the most interesting cases for manual inspection (those where it's more likely Maxim's heuristics are finding valid authors reposurgeon didn't) and separate those from cases where different subjective choices were made (e.g. of how to assign an author when one person backports another's patch, or multi-author commits where one conversion chose one author as the main one and the other conversion chose the other author). > If you guys want to ever finish, you'll need to drop the quest for > perfection, because this leads to a) much more work, and b) worse quality > in the end. To me, that indicates that using a conversion tool that is conservative in its heuristics, and then selectively applying improvements to the extent they can be done safely with manual review in a reasonable time, is better than applying a conversion tool with more aggressive heuristics. The issues with the reposurgeon conversion listed in Maxim's last comments were of the form "reposurgeon is being conservative in how it generates metadata from SVN information". I think that's a very good basis for adding on a limited set of safe improvements to authors and commit messages that can be done reasonably soon and then doing the final conversion with reposurgeon. -- Joseph S. Myers jos...@codesourcery.com