On Fri, May 31, 2019 at 12:05:41AM +0000, Joseph Myers wrote: > On Wed, 29 May 2019, Segher Boessenkool wrote: > > > On Wed, May 29, 2019 at 12:53:30AM +0000, Joseph Myers wrote: > > > On Fri, 24 May 2019, Segher Boessenkool wrote: > > > > > > > IMO the best we can do is use what we already have: what CVS or SVN used > > > > as the committer identity. *That* info is *correct* at least. > > > > > > CVS and SVN have a local identity. git has a global identity. I > > > consider > > > > Git has an identity (well, two) _per commit_, and there is no way you can > > reconstruct people's prefered name and email address (at any point in time, > > for every commit separately) correctly. IMO it is much better to not even > > try. We already *have* enough info for anyone to trivially look up who > > wrote > > what, and what might be that person's email address at the time. But > > pretending that is more than a guess is just wrong. > > I think not doing a best-effort identification (name+email) is just as
And I think guessing is not a "best effort", but just wrong. > wrong as converting a CVS repository to a changeset-based system without > doing a best-effort unification of commits to different files around the > same time with the same log message into changesets. Both are the same These are not similar situations at all. Converting something to an SVN- like data model is necessary for the resulting repo to work acceptably; guessing person's names and email addresses is just nice-to-have in the best case, and misleading in other cases. > sort of heuristic conversion of data to the form idiomatic for a different > version control system based around different concepts. Neither is It's single short line of text in SVN. It is a single short line of text in Git. Both just show who wrote a patch, or who committed it. Good luck finding out who was the primary author of every commit, btw. > perfect, but the most useful conversion tries to combine CVS commits to > different files into changesets, and the most useful conversion tries to > identify authors in the way idiomatic for git using the information we > have about what person (globally) a given username on a given system > corresponds to. We don't have that information. This information can change over time, and we never did track people's email addresses properly either. Segher