Re: format of svn:author

Mark Mielke Wed, 04 Jan 2012 23:31:08 -0800

On 01/04/2012 01:42 PM, Julian Foad wrote:

A PROPOSAL FOR EXTENDED AUTHOR IDENTIFICATION


USE CASES

1.[This one I am aware of.]

   A large company has authenticated user ids that are numeric.  That means the "log" and 
"blame" information shown by most Subversion clients is not easy to understand.  
Therefore they use a (post-commit?) hook to change
the svn:author property to a more friendly string, which (mostly) solves the 
display issue.  However, it causes other problems.  [What problems?]


Problems:

1) The unique identifier is no longer a direct match against externalidentity management systems. For example, if svn:author is "Mark Mielke(1234567)" and LDAP stores employeeNumber="124567" and cn="Mark Mielke",very few tools support the ability to pattern match svn:author to pullout character groups and to then lookup in an external identitymanagement system using the character group. I can't think of a singletool that provides this capability out of box. In these tools, if I amlogged in as "1234567" it cannot know which commits are mine, because"1234567" is not equal to "Mark Mielke (1234567)".

2) Users may end up with multiple unique identifiers over time due tothe unique identifier portion being combined with a more approximate(and therefore inaccurate) humanly readable form. Display name or emailmay change over time, and the ability to uniquely identify the authorbecomes more complex as the mapping must include every instancediscovered at commit time. Some of this is subject to which identifieris selected as the unique identifier - but let us say that a system suchas Forge is used and the identifier is some sort of username such as"twoleftfeet". The email might start as "j...@doe.com", but end up as"j...@acme.com". Any report around commits such as commits made peruser, or for a particular user - would either end up with split history(treating the history as belong to two or more users) or the reportingalgorithm would need to allow for each instance to be recognized as thesame user. Similarly - names can change. Perhaps the person gets marriedor divorced. "Mary Clairmont (prettygirl99)" becomes "Mary Dupont(prettygirl99)".

For both of these problems, one could argue that the reporting toolcould take the complex value into account. It could parse out the uniqueidentifier. This presumes that you have access to the source code andthe ability to make the changes which (license restrictions, resourcerequirements, ...). This could be true of one or two tools - butcertainly not all tools that support Subversion as this is a fairlymassive list. This is particularly problematic if there is no standardas it means that my work in my company against my convention is noteasily shareable with your work in your company against your convention.

2. [This one is a guess.]

   The leader of a small development team sharing a Subversion repository with 
other teams wants to set up a build slave that will send an email to the users 
who committed revisions leading to a build failure.  The machine can see the 
Subversion user id but how can it get the user's email address?  The team 
leader could ask the repository administrator to add a post-commit hook that 
adds an email address to a revision property after every commit, but that

     * requires involving the server admin;
     * won't get updated when the user changes their email address;
     * won't work for testing old revisions that were already committed before 
that time;
     * won't work if the build slave software needs to read a list of all user 
id->email mappings at once.

Much of the above can be accomplished today as it is server side andserver side gives more flexibility as it can be customized in one place.To extend the above to a situation that makes it more difficult -

There are a number of tools such as Crucible/FishEye that will monitor aSubversion repository for changes, and then take action based on thecommit log. So the actions are being performed by "clients" and not bythe server itself. If the "client" sees a Subversion commit for"1234567" or "jdoe", how does it know who is the authority on what emailis associated with this account? With svn:author being the uniqueidentifier - this is not that difficult in many cases as it is a simpleLDAP query away. However, if we mix 1) and 2) together, we get the sameproblem. Subversion users need to see full name in "svn log" output, sothey update svn:author to include the full name like "Mark Mielke(1234567)", and then Crucible/FishEye sees the commit as authored by"Mark Mielke (1234567)" and how does it look up this value in LDAP tofind the email?

3. [This one is a guess.]

   An administrator wants to integrate Subversion with an issue tracker.  Users 
have different user ids on the two tools.  The admin wants to configure the 
tracker so that it automatically annotates an already committed Subversion 
revision with some status information.  How can the tracker know with what user 
id to contact the Subversion server?

We don't have this requirement, but I believe this requirement can beseen in situations such as:

1) Issue tracker, such as JIRA, is externally visible. Users andcustomers can sign up to the external site directly. Identity managementsystem is stored in JIRA as these are essentially "external users".

2) Source management system, such as Subversion, is internal only. Usersand customers may be able to access the content read-only. Identitymanagement system is stored in Microsoft Active Directory or OpenLDAPand are assigned according to corporate policies.

In this scenario, there are a lot of requirements to be able to map backand forth between the internal and external ID. The binding might bestored as an LDAP attribute such as "jirauser".

I don't know if this particular problem is for Subversion to solve ornot - but if the Subversion solution was general enough to supportconfiguration that might allow this information to be exposed in ageneral way, somebody someday would probably be thankful. I wouldn't goout of my way to specifically solve this requirement, though. Just, ifit comes for free with a good solution to the other requirements, don'tblock it. :-)

The rest of the proposal addresses UC1 and part of UC2 but not UC3.  (UC3 looks 
like it needs some totally separate solution, outside of Subversion.)


Agree.

REQUIREMENTS

   A Subversion client (of any kind so designed) shall be able to read extended 
information about the author of a revision.  This information shall consist of 
a (possibly empty) set of fields.  The set of possible extended author fields 
shallinclude at least:

     * authenticated user id

     * display name
     * email address

   It shall be possible to add other fields on the server side (by software 
upgrade and/or by configuration), and for a client (of any kind so designed) to 
discover and read these fields without any software upgrade on the client side.
   The svn:author property shall continue to exist.  When not using the 
extended author fields, the svn:author property must continue to operate as 
before.  When using the extended author fields, the design may restrict the use 
of the svn:author field.  Example: the design could require that if extended 
author fields are to be usable then the svn:author field always holds the 
authenticated user id and must always be present and non-empty.

This is a smart compromise. Forwards and backwards compatibility.Interface restrictions to guarantee extensibility.

In terms of some actual implementation of this, the documentation shouldprobably recommend that clients make use of the display name and emailaddress as standard fields, and only optionally be aware ofrepository-specific additional attributes. Otherwise it gets prettymessy in that you'd have to provide a means to make clients aware ofwhat is being published and how and where they should be displayed. Iwould start with just the two and specific recommendations. For example,annotated source code on a web page might show the display name, butwhen one mouses over the display name or clicks on a gear icon to theside, access to additional details might be displayed. The display namemight be linked such that a mouse click on the display name pulls up theuser profile, but the user profile would be identifier by the uniqueidentifier. Enough information to recommend a consistent and usefulinterface, but not enough to be restrictive.


You cover some of this below:

A client shall access the extended author fields through the Subversion
server, through the existing client-server protocols, possibly with protocol
extensions. Any protocol extensions shall be backward compatible in that an
old server with a new client or an old client with a new server shall (without
user intervention) use the old 'svn:author' property.

The fields that are available from a particular server or repository are
determined by the administrator. For any particular committed revision, the
server may provide any or all or none of the extended author fields. A client
cannot rely on any particular field being available except to the extent that
the administrator gives such an assurance. Example: if the client requests the
authenticated user id and email address for a revision whose author has no
email address recorded,the server shall provide the authenticated user id but
no email address. If the server is temporarily unable to look up any
information about a user, the server should respond with no extended author
fieldsinstead of waiting.

The extended author fields are dynamic in the sense that the server need not always return the same values
for the same committed revision. For example,a client might repeat exactly the same request for information
about revision 1234 twice in quick succession, and the server might provide the email address as
"a@b.c" the first time and "d...@ee.ff" the second time. Even the "authenticated
user id" field could change.

DESIGN

The extended author fields are delivered through revision properties. The
values are UTF-8 text. These revision properties are readable but not writable
by clients.

Three property names are initially designated as "well known":

* prop name: "svn:author:authn-id"
purpose: authenticated user id
format: as used by Subversion's authentication (the default
value of svn:author)

* prop name: "svn:author:display-name"
purpose: display name
format: a single line (no line breaks), e.g. person's full
name or shortened name or nickname

* prop name: "svn:author:email"
purpose: email address
format: [TO BE SPECIFIED HERE]

Other property names in this name space beginning with "svn:author:" can be designated
as "well known" in the future, by an official announcement from the Subversion project.

An administrator can configure other extended author fields to use property names that are not
in the "svn:" name space. Example: an administrator could configure the property name
"author:pgp-sig" to hold the author's PGP signature.


Excellent.

SERVER DESIGN
   Any time the server is about to send a set of revision properties to
the client, the server looks up the extended author fields and adds
corresponding properties to the set of revision properties that it
reports to the client.  These property values override any values The server looks up the 
extended author fieldsthrough some mechanism not defined here,using the value of 
the"svn:author" property as a key.  The server may cache the results, provided 
that there is a way for the administrator to make the server use updated information.

The cache can be a typical cache. The information that might be returnedshould generally be semi-persistent and not changing from minute tominute. As long as it takes effect within a reason time period(configurable along with the configuration on how to obtain the extendedattribute information in the first place?) there is no problem.

   If the client attempts to set any revision property in the "svn:author:" 
name space, the server shall report an error to the client.  This applies even if the 
property value matches the value that was last read from the server or is currently known 
to the server, and even if the
specific property name is not known to the server.  If the client attempts to set any 
revision property that is not in the "svn:author:" name space but might be 
configured as an extended author field, the server records that revision property in the 
normal way.  If a revision property (of any name) has a stored value and the extended 
author field look-up also provides a value for the same property name, the latter takes 
priority.


   The extended author fields [are | are not] available to the following hook 
scripts: pre-commit, ...

Although not necessary for the fields to be available to the hookscripts - it would be extremely convenient for them to be so. We havehooks that perform LDAP lookups - but each hook has to have intimateknowledge of the environment it is contained in making them difficult tobe published - for example, as an open source component that otherscould re-use. They may have hard coded LDAP bind passwords for example,making them insecure to publish. It would be extremely nice if any opensource component writer could make use of these fields without having tocare where the values come from, and the configuration for where thevalues come from could be centralized in one place - the Subversion server.

CLIENT DESIGN

   Just an example.  The "svn log" and "svn blame" commands could request the revision property named 
"svn:author:display-name", and if that is returned then use it instead of "svn:author", otherwise use the value of 
"svn:author".  Further, a client-side configuration option could specify which property name should be used for these display purposes, so 
for example some users in a particular team could choose to have the "author:nickname" revision property displayed instead of 
"svn:author:display-name".

This would be great. I think many people like to see the format that GITuses: Display Name <email@domain>. This should be an option.

FURTHER SCOPE

   Does a client need to be able to look up the information in other ways, such 
as starting from svn:author rather than a revision number, or starting from an 
extended author field?

I'm not clear on how "svn blame" is implemented. Presuming that it knowswhat commit each line belongs to and that these are already beingqueried (i.e. the implementation won't have to significantly change as aresult of this proposal), it is satisfactory for it to access theinformation from the revision properties. I don't at the moment see arequirement to be able to query a list of known users, or informationfor a particular user. Subversion is not a directory service. The maincapability being provided is to enable Subversion clients to be ignorantabout how the server has been configured to perform authentication andidentification of users, but still be able to provide extendedinformation about Subversion metadata back to the user. Staying withindomain is probably smart as it can be a clear boundary around the scopethat is being agreed to.


Final thoughts on this draft:

The reference implementation should come with perhaps two server modulesto support this capability. One should be a caching LDAP implementationthat is fully configurable. One should be based on operating systemservices (PAM or getent() for Unix?). Other implementations should bepossible, but left outside of core.

If the Subversion developers agree to some refinement of this proposal,I understand that developers resources are limited and that there is noguarantee that it would ever be implemented or if implemented that itwould ever be completed and distributed in core. I'm thinking that thissort of project might be a good entry point for somebody such as myselfto contribute. Not sure about time right now - but if you put in theeffort to review and refine, then it would be only fair for me to atleast try to contribute.


Thanks for the time you put into this Julian.

--
Mark Mielke<m...@mielke.cc>

Re: format of svn:author

Reply via email to