On 01/04/2012 01:42 PM, Julian Foad wrote:
A PROPOSAL FOR EXTENDED AUTHOR IDENTIFICATION

USE CASES

1.[This one I am aware of.]

   A large company has authenticated user ids that are numeric.  That means the "log" and 
"blame" information shown by most Subversion clients is not easy to understand.  
Therefore they use a (post-commit?) hook to change
the svn:author property to a more friendly string, which (mostly) solves the 
display issue.  However, it causes other problems.  [What problems?]

Problems:

1) The unique identifier is no longer a direct match against external identity management systems. For example, if svn:author is "Mark Mielke (1234567)" and LDAP stores employeeNumber="124567" and cn="Mark Mielke", very few tools support the ability to pattern match svn:author to pull out character groups and to then lookup in an external identity management system using the character group. I can't think of a single tool that provides this capability out of box. In these tools, if I am logged in as "1234567" it cannot know which commits are mine, because "1234567" is not equal to "Mark Mielke (1234567)".

2) Users may end up with multiple unique identifiers over time due to the unique identifier portion being combined with a more approximate (and therefore inaccurate) humanly readable form. Display name or email may change over time, and the ability to uniquely identify the author becomes more complex as the mapping must include every instance discovered at commit time. Some of this is subject to which identifier is selected as the unique identifier - but let us say that a system such as Forge is used and the identifier is some sort of username such as "twoleftfeet". The email might start as "j...@doe.com", but end up as "j...@acme.com". Any report around commits such as commits made per user, or for a particular user - would either end up with split history (treating the history as belong to two or more users) or the reporting algorithm would need to allow for each instance to be recognized as the same user. Similarly - names can change. Perhaps the person gets married or divorced. "Mary Clairmont (prettygirl99)" becomes "Mary Dupont (prettygirl99)".

For both of these problems, one could argue that the reporting tool could take the complex value into account. It could parse out the unique identifier. This presumes that you have access to the source code and the ability to make the changes which (license restrictions, resource requirements, ...). This could be true of one or two tools - but certainly not all tools that support Subversion as this is a fairly massive list. This is particularly problematic if there is no standard as it means that my work in my company against my convention is not easily shareable with your work in your company against your convention.


2. [This one is a guess.]

   The leader of a small development team sharing a Subversion repository with 
other teams wants to set up a build slave that will send an email to the users 
who committed revisions leading to a build failure.  The machine can see the 
Subversion user id but how can it get the user's email address?  The team 
leader could ask the repository administrator to add a post-commit hook that 
adds an email address to a revision property after every commit, but that

     * requires involving the server admin;
     * won't get updated when the user changes their email address;
     * won't work for testing old revisions that were already committed before 
that time;
     * won't work if the build slave software needs to read a list of all user 
id->email mappings at once.

Much of the above can be accomplished today as it is server side and server side gives more flexibility as it can be customized in one place. To extend the above to a situation that makes it more difficult -

There are a number of tools such as Crucible/FishEye that will monitor a Subversion repository for changes, and then take action based on the commit log. So the actions are being performed by "clients" and not by the server itself. If the "client" sees a Subversion commit for "1234567" or "jdoe", how does it know who is the authority on what email is associated with this account? With svn:author being the unique identifier - this is not that difficult in many cases as it is a simple LDAP query away. However, if we mix 1) and 2) together, we get the same problem. Subversion users need to see full name in "svn log" output, so they update svn:author to include the full name like "Mark Mielke (1234567)", and then Crucible/FishEye sees the commit as authored by "Mark Mielke (1234567)" and how does it look up this value in LDAP to find the email?

3. [This one is a guess.]

   An administrator wants to integrate Subversion with an issue tracker.  Users 
have different user ids on the two tools.  The admin wants to configure the 
tracker so that it automatically annotates an already committed Subversion 
revision with some status information.  How can the tracker know with what user 
id to contact the Subversion server?

We don't have this requirement, but I believe this requirement can be seen in situations such as:

1) Issue tracker, such as JIRA, is externally visible. Users and customers can sign up to the external site directly. Identity management system is stored in JIRA as these are essentially "external users".

2) Source management system, such as Subversion, is internal only. Users and customers may be able to access the content read-only. Identity management system is stored in Microsoft Active Directory or OpenLDAP and are assigned according to corporate policies.

In this scenario, there are a lot of requirements to be able to map back and forth between the internal and external ID. The binding might be stored as an LDAP attribute such as "jirauser".

I don't know if this particular problem is for Subversion to solve or not - but if the Subversion solution was general enough to support configuration that might allow this information to be exposed in a general way, somebody someday would probably be thankful. I wouldn't go out of my way to specifically solve this requirement, though. Just, if it comes for free with a good solution to the other requirements, don't block it. :-)

The rest of the proposal addresses UC1 and part of UC2 but not UC3.  (UC3 looks 
like it needs some totally separate solution, outside of Subversion.)

Agree.

REQUIREMENTS

   A Subversion client (of any kind so designed) shall be able to read extended 
information about the author of a revision.  This information shall consist of 
a (possibly empty) set of fields.  The set of possible extended author fields 
shallinclude at least:

     * authenticated user id

     * display name
     * email address

   It shall be possible to add other fields on the server side (by software 
upgrade and/or by configuration), and for a client (of any kind so designed) to 
discover and read these fields without any software upgrade on the client side.
   The svn:author property shall continue to exist.  When not using the 
extended author fields, the svn:author property must continue to operate as 
before.  When using the extended author fields, the design may restrict the use 
of the svn:author field.  Example: the design could require that if extended 
author fields are to be usable then the svn:author field always holds the 
authenticated user id and must always be present and non-empty.

This is a smart compromise. Forwards and backwards compatibility. Interface restrictions to guarantee extensibility.

In terms of some actual implementation of this, the documentation should probably recommend that clients make use of the display name and email address as standard fields, and only optionally be aware of repository-specific additional attributes. Otherwise it gets pretty messy in that you'd have to provide a means to make clients aware of what is being published and how and where they should be displayed. I would start with just the two and specific recommendations. For example, annotated source code on a web page might show the display name, but when one mouses over the display name or clicks on a gear icon to the side, access to additional details might be displayed. The display name might be linked such that a mouse click on the display name pulls up the user profile, but the user profile would be identifier by the unique identifier. Enough information to recommend a consistent and useful interface, but not enough to be restrictive.

You cover some of this below:

   A client shall access the extended author fields through the Subversion 
server, through the existing client-server protocols, possibly with protocol 
extensions.  Any protocol extensions shall be backward compatible in that an 
old server with a new client or an old client with a new server shall (without 
user intervention) use the old 'svn:author' property.


   The fields that are available from a particular server or repository are 
determined by the administrator.  For any particular committed revision, the 
server may provide any or all or none of the extended author fields.  A client 
cannot rely on any particular field being available except to the extent that 
the administrator gives such an assurance.  Example: if the client requests the 
authenticated user id and email address for a revision whose author has no 
email address recorded,the server shall provide the authenticated user id but 
no email address.  If the server is temporarily unable to look up any 
information about a user, the server should respond with no extended author 
fieldsinstead of waiting.


   The extended author fields are dynamic in the sense that the server need not always return the same values 
for the same committed revision.  For example,a client might repeat exactly the same request for information 
about revision 1234 twice in quick succession, and the server might provide the email address as 
"a@b.c" the first time and "d...@ee.ff" the second time.  Even the "authenticated 
user id" field could change.


DESIGN

   The extended author fields are delivered through revision properties.  The 
values are UTF-8 text.  These revision properties are readable but not writable 
by clients.

   Three property names are initially designated  as "well known":

     * prop name: "svn:author:authn-id"
       purpose: authenticated user id
       format: as used by Subversion's authentication (the default
         value of svn:author)

     * prop name: "svn:author:display-name"
       purpose: display name
       format: a single line (no line breaks), e.g. person's full
         name or shortened name or nickname

     * prop name: "svn:author:email"
       purpose: email address
       format: [TO BE SPECIFIED HERE]


   Other property names in this name space beginning with "svn:author:" can be designated 
as "well known" in the future, by an official announcement from the Subversion project.

   An administrator can configure other extended author fields to use property names that are not 
in the "svn:" name space.  Example: an administrator could configure the property name 
"author:pgp-sig" to hold the author's PGP signature.

Excellent.

SERVER DESIGN
   Any time the server is about to send a set of revision properties to
the client, the server looks up the extended author fields and adds
corresponding properties to the set of revision properties that it
reports to the client.  These property values override any values The server looks up the 
extended author fieldsthrough some mechanism not defined here,using the value of 
the"svn:author" property as a key.  The server may cache the results, provided 
that there is a way for the administrator to make the server use updated information.

The cache can be a typical cache. The information that might be returned should generally be semi-persistent and not changing from minute to minute. As long as it takes effect within a reason time period (configurable along with the configuration on how to obtain the extended attribute information in the first place?) there is no problem.

   If the client attempts to set any revision property in the "svn:author:" 
name space, the server shall report an error to the client.  This applies even if the 
property value matches the value that was last read from the server or is currently known 
to the server, and even if the
specific property name is not known to the server.  If the client attempts to set any 
revision property that is not in the "svn:author:" name space but might be 
configured as an extended author field, the server records that revision property in the 
normal way.  If a revision property (of any name) has a stored value and the extended 
author field look-up also provides a value for the same property name, the latter takes 
priority.


   The extended author fields [are | are not] available to the following hook 
scripts: pre-commit, ...

Although not necessary for the fields to be available to the hook scripts - it would be extremely convenient for them to be so. We have hooks that perform LDAP lookups - but each hook has to have intimate knowledge of the environment it is contained in making them difficult to be published - for example, as an open source component that others could re-use. They may have hard coded LDAP bind passwords for example, making them insecure to publish. It would be extremely nice if any open source component writer could make use of these fields without having to care where the values come from, and the configuration for where the values come from could be centralized in one place - the Subversion server.

CLIENT DESIGN

   Just an example.  The "svn log" and "svn blame" commands could request the revision property named 
"svn:author:display-name", and if that is returned then use it instead of "svn:author", otherwise use the value of 
"svn:author".  Further, a client-side configuration option could specify which property name should be used for these display purposes, so 
for example some users in a particular team could choose to have the "author:nickname" revision property displayed instead of 
"svn:author:display-name".

This would be great. I think many people like to see the format that GIT uses: Display Name <email@domain>. This should be an option.

FURTHER SCOPE

   Does a client need to be able to look up the information in other ways, such 
as starting from svn:author rather than a revision number, or starting from an 
extended author field?


I'm not clear on how "svn blame" is implemented. Presuming that it knows what commit each line belongs to and that these are already being queried (i.e. the implementation won't have to significantly change as a result of this proposal), it is satisfactory for it to access the information from the revision properties. I don't at the moment see a requirement to be able to query a list of known users, or information for a particular user. Subversion is not a directory service. The main capability being provided is to enable Subversion clients to be ignorant about how the server has been configured to perform authentication and identification of users, but still be able to provide extended information about Subversion metadata back to the user. Staying within domain is probably smart as it can be a clear boundary around the scope that is being agreed to.

Final thoughts on this draft:

The reference implementation should come with perhaps two server modules to support this capability. One should be a caching LDAP implementation that is fully configurable. One should be based on operating system services (PAM or getent() for Unix?). Other implementations should be possible, but left outside of core.

If the Subversion developers agree to some refinement of this proposal, I understand that developers resources are limited and that there is no guarantee that it would ever be implemented or if implemented that it would ever be completed and distributed in core. I'm thinking that this sort of project might be a good entry point for somebody such as myself to contribute. Not sure about time right now - but if you put in the effort to review and refine, then it would be only fair for me to at least try to contribute.

Thanks for the time you put into this Julian.

--
Mark Mielke<m...@mielke.cc>

Reply via email to