On 01/02/2012 02:52 AM, Alan Barrett wrote:
On Sun, 01 Jan 2012, Mark Mielke wrote:
Another idea is to change the revprop's value in the pre-commit or post-commit hook: [...]

This is what we've been doing for about two years. It has the consequence that tools don't automatically match unique identifier to commit as they no longer match.

If your third party tools can't extract the unique ID from svn:author = "Display Name <uniqueid@domain>" then perhaps the problem lies at least as much in your third party tools as in subversion.

I wonder if you thought this through before posting. :-)

You are saying that if I make up an essentially arbitrary scheme, such as "Display Name <uniqueid@domain>", and you have a tool which is unaware of my scheme, and therefore your tool fails to matches users in the region because of my scheme - that your tool has the problem? Despite the documentation for Subversion never mentioning or even suggesting a convention that you should be responsible for understanding?

No.

The convention must be defined in the Subversion book, and it must be part of the release notes so that third party tools adhere to the convention.

Otherwise, only extremely casual interpretation can be done of the field. For example, it can be treated as a unique identifier - but more like a "foreign key" unique identifier in the sense that it is a key in some domain, but not necessarily a domain I know about or am an authority for. This is why tools such as FishEye provide a "committer mapping" that is precisely this. It allows me to code on a per-repository basis each of the committer values that I want to associate with my own FishEye account. This is really horrible for dozens of repositories and thousands of users. Every user having to input their own mappings? Yuck, yuck, yuck.

If, instead, a convention was defined such that (and just hand waving here, I'm not really attached to these details):

    svn:author => unique identifier
    svn:author-name => Mark Mielke
    svn:author-email => m...@mark.mielke.cc

Then tools could make much more intelligent decisions on what to do or show. They could use svn:author as the mapping key, but show name and email in "svn log" or graphical browsers.

The above model is a simple solution to the problem. More data stored for every commit. Data which can be used by downstream tools. This has a benefit in that the data is static which is sometimes good. In a large project, there is normally a turnover, and accounts that exists or are active in one year are not necessarily the same as the ones active in another year. By taking a snapshot of the data at the time of commit, it represents a permanent record of sorts. ClearCase is a system which does it this way. Event history records which track such things as object creation which is the closest map to svn:author have username, domain (NIS - old school), and fullname.

The other alternative is for a Subversion client to be able to lookup details for svn:author by asking the server using a published protocol. This model would allow the server to implement these queries transparently using LDAP lookups or similar depending on the requirements of the project. This stores less data for every commit, and allows for dynamic updates. It would allow for "Mark Mielke" to become "Mielke, Mark" with a server side configuration, but in contrast to the previous method, it would not all for a snapshot of history to be taken. It would be a requirement that the identity management system used on the server would always have a record for me even after I am gone - or - alternatively, that the detail would become more vague over time. I disappear, and my account disappears - so it is left with only a unique identifier which might not be enough information.

In our particular case, we value all three of: 1) unique identifiers to be able to do cross referencing of reports between tools, 2) display of humanly readable names in output such as "svn log" or annotations in FishEye, ViewVC, Eclipse, or whatever tool the user is using, and 3) permanent historical record for auditing purposes.

Our exact compromise for the last three years is:

1) original svn:author value arrives on the server as as "1234567" - a corporate unique identifier 2) pre-commit re-writes svn:author to "Full Name (<original svn:author value>)"
3) pre-commit adds <company>:gid as "<original svn:author value>"

Then as I mention - various other tools such as FishEye have explicit mappings from "Mark Mielke (1234567)" => "1234567" for each Subversion repository. We're primarily a ClearCase and Perforce shop right now - but even so, I have several Subversion repository mappings of this form. It works. It just sucks.

For svn:author to have structure - either internally using punctuation such as Unix gecos, or separated out as separate attributes - and for tools to all honour this structure - would be far more ideal. As Subversion is already well established, separate attributes is probably the best approach as it would enable forwards and backwards compatibility for uses of svn:author implemented by the Subversion code base itself. Tools that know how to access and do intelligent things with the new fields could feel free to do so. Users of tools that do not do something intelligent things with the new fields could point to the Subversion release notes and Subversion book and say "this new attribute svn:author-name should be recognized by your tool", the change can make the tool roadmap, and we can all be happy.

--
Mark Mielke<m...@mielke.cc>

Reply via email to