On 01/04/2012 01:42 PM, Julian Foad wrote:
A PROPOSAL FOR EXTENDED AUTHOR IDENTIFICATION
USE CASES
1.[This one I am aware of.]
A large company has authenticated user ids that are numeric. That means the "log" and
"blame" information shown by most Subversion clients is not easy to understand.
Therefore they use a (post-commit?) hook to change
the svn:author property to a more friendly string, which (mostly) solves the
display issue. However, it causes other problems. [What problems?]
Problems:
1) The unique identifier is no longer a direct match against external
identity management systems. For example, if svn:author is "Mark Mielke
(1234567)" and LDAP stores employeeNumber="124567" and cn="Mark Mielke",
very few tools support the ability to pattern match svn:author to pull
out character groups and to then lookup in an external identity
management system using the character group. I can't think of a single
tool that provides this capability out of box. In these tools, if I am
logged in as "1234567" it cannot know which commits are mine, because
"1234567" is not equal to "Mark Mielke (1234567)".
2) Users may end up with multiple unique identifiers over time due to
the unique identifier portion being combined with a more approximate
(and therefore inaccurate) humanly readable form. Display name or email
may change over time, and the ability to uniquely identify the author
becomes more complex as the mapping must include every instance
discovered at commit time. Some of this is subject to which identifier
is selected as the unique identifier - but let us say that a system such
as Forge is used and the identifier is some sort of username such as
"twoleftfeet". The email might start as "j...@doe.com", but end up as
"j...@acme.com". Any report around commits such as commits made per
user, or for a particular user - would either end up with split history
(treating the history as belong to two or more users) or the reporting
algorithm would need to allow for each instance to be recognized as the
same user. Similarly - names can change. Perhaps the person gets married
or divorced. "Mary Clairmont (prettygirl99)" becomes "Mary Dupont
(prettygirl99)".
For both of these problems, one could argue that the reporting tool
could take the complex value into account. It could parse out the unique
identifier. This presumes that you have access to the source code and
the ability to make the changes which (license restrictions, resource
requirements, ...). This could be true of one or two tools - but
certainly not all tools that support Subversion as this is a fairly
massive list. This is particularly problematic if there is no standard
as it means that my work in my company against my convention is not
easily shareable with your work in your company against your convention.
2. [This one is a guess.]
The leader of a small development team sharing a Subversion repository with
other teams wants to set up a build slave that will send an email to the users
who committed revisions leading to a build failure. The machine can see the
Subversion user id but how can it get the user's email address? The team
leader could ask the repository administrator to add a post-commit hook that
adds an email address to a revision property after every commit, but that
* requires involving the server admin;
* won't get updated when the user changes their email address;
* won't work for testing old revisions that were already committed before
that time;
* won't work if the build slave software needs to read a list of all user
id->email mappings at once.
Much of the above can be accomplished today as it is server side and
server side gives more flexibility as it can be customized in one place.
To extend the above to a situation that makes it more difficult -
There are a number of tools such as Crucible/FishEye that will monitor a
Subversion repository for changes, and then take action based on the
commit log. So the actions are being performed by "clients" and not by
the server itself. If the "client" sees a Subversion commit for
"1234567" or "jdoe", how does it know who is the authority on what email
is associated with this account? With svn:author being the unique
identifier - this is not that difficult in many cases as it is a simple
LDAP query away. However, if we mix 1) and 2) together, we get the same
problem. Subversion users need to see full name in "svn log" output, so
they update svn:author to include the full name like "Mark Mielke
(1234567)", and then Crucible/FishEye sees the commit as authored by
"Mark Mielke (1234567)" and how does it look up this value in LDAP to
find the email?
3. [This one is a guess.]
An administrator wants to integrate Subversion with an issue tracker. Users
have different user ids on the two tools. The admin wants to configure the
tracker so that it automatically annotates an already committed Subversion
revision with some status information. How can the tracker know with what user
id to contact the Subversion server?
We don't have this requirement, but I believe this requirement can be
seen in situations such as:
1) Issue tracker, such as JIRA, is externally visible. Users and
customers can sign up to the external site directly. Identity management
system is stored in JIRA as these are essentially "external users".
2) Source management system, such as Subversion, is internal only. Users
and customers may be able to access the content read-only. Identity
management system is stored in Microsoft Active Directory or OpenLDAP
and are assigned according to corporate policies.
In this scenario, there are a lot of requirements to be able to map back
and forth between the internal and external ID. The binding might be
stored as an LDAP attribute such as "jirauser".
I don't know if this particular problem is for Subversion to solve or
not - but if the Subversion solution was general enough to support
configuration that might allow this information to be exposed in a
general way, somebody someday would probably be thankful. I wouldn't go
out of my way to specifically solve this requirement, though. Just, if
it comes for free with a good solution to the other requirements, don't
block it. :-)
The rest of the proposal addresses UC1 and part of UC2 but not UC3. (UC3 looks
like it needs some totally separate solution, outside of Subversion.)
Agree.
REQUIREMENTS
A Subversion client (of any kind so designed) shall be able to read extended
information about the author of a revision. This information shall consist of
a (possibly empty) set of fields. The set of possible extended author fields
shallinclude at least:
* authenticated user id
* display name
* email address
It shall be possible to add other fields on the server side (by software
upgrade and/or by configuration), and for a client (of any kind so designed) to
discover and read these fields without any software upgrade on the client side.
The svn:author property shall continue to exist. When not using the
extended author fields, the svn:author property must continue to operate as
before. When using the extended author fields, the design may restrict the use
of the svn:author field. Example: the design could require that if extended
author fields are to be usable then the svn:author field always holds the
authenticated user id and must always be present and non-empty.
This is a smart compromise. Forwards and backwards compatibility.
Interface restrictions to guarantee extensibility.
In terms of some actual implementation of this, the documentation should
probably recommend that clients make use of the display name and email
address as standard fields, and only optionally be aware of
repository-specific additional attributes. Otherwise it gets pretty
messy in that you'd have to provide a means to make clients aware of
what is being published and how and where they should be displayed. I
would start with just the two and specific recommendations. For example,
annotated source code on a web page might show the display name, but
when one mouses over the display name or clicks on a gear icon to the
side, access to additional details might be displayed. The display name
might be linked such that a mouse click on the display name pulls up the
user profile, but the user profile would be identifier by the unique
identifier. Enough information to recommend a consistent and useful
interface, but not enough to be restrictive.
You cover some of this below:
A client shall access the extended author fields through the Subversion
server, through the existing client-server protocols, possibly with protocol
extensions. Any protocol extensions shall be backward compatible in that an
old server with a new client or an old client with a new server shall (without
user intervention) use the old 'svn:author' property.
The fields that are available from a particular server or repository are
determined by the administrator. For any particular committed revision, the
server may provide any or all or none of the extended author fields. A client
cannot rely on any particular field being available except to the extent that
the administrator gives such an assurance. Example: if the client requests the
authenticated user id and email address for a revision whose author has no
email address recorded,the server shall provide the authenticated user id but
no email address. If the server is temporarily unable to look up any
information about a user, the server should respond with no extended author
fieldsinstead of waiting.
The extended author fields are dynamic in the sense that the server need not always return the same values
for the same committed revision. For example,a client might repeat exactly the same request for information
about revision 1234 twice in quick succession, and the server might provide the email address as
"a@b.c" the first time and "d...@ee.ff" the second time. Even the "authenticated
user id" field could change.
DESIGN
The extended author fields are delivered through revision properties. The
values are UTF-8 text. These revision properties are readable but not writable
by clients.
Three property names are initially designated as "well known":
* prop name: "svn:author:authn-id"
purpose: authenticated user id
format: as used by Subversion's authentication (the default
value of svn:author)
* prop name: "svn:author:display-name"
purpose: display name
format: a single line (no line breaks), e.g. person's full
name or shortened name or nickname
* prop name: "svn:author:email"
purpose: email address
format: [TO BE SPECIFIED HERE]
Other property names in this name space beginning with "svn:author:" can be designated
as "well known" in the future, by an official announcement from the Subversion project.
An administrator can configure other extended author fields to use property names that are not
in the "svn:" name space. Example: an administrator could configure the property name
"author:pgp-sig" to hold the author's PGP signature.
Excellent.
SERVER DESIGN
Any time the server is about to send a set of revision properties to
the client, the server looks up the extended author fields and adds
corresponding properties to the set of revision properties that it
reports to the client. These property values override any values The server looks up the
extended author fieldsthrough some mechanism not defined here,using the value of
the"svn:author" property as a key. The server may cache the results, provided
that there is a way for the administrator to make the server use updated information.
The cache can be a typical cache. The information that might be returned
should generally be semi-persistent and not changing from minute to
minute. As long as it takes effect within a reason time period
(configurable along with the configuration on how to obtain the extended
attribute information in the first place?) there is no problem.
If the client attempts to set any revision property in the "svn:author:"
name space, the server shall report an error to the client. This applies even if the
property value matches the value that was last read from the server or is currently known
to the server, and even if the
specific property name is not known to the server. If the client attempts to set any
revision property that is not in the "svn:author:" name space but might be
configured as an extended author field, the server records that revision property in the
normal way. If a revision property (of any name) has a stored value and the extended
author field look-up also provides a value for the same property name, the latter takes
priority.
The extended author fields [are | are not] available to the following hook
scripts: pre-commit, ...
Although not necessary for the fields to be available to the hook
scripts - it would be extremely convenient for them to be so. We have
hooks that perform LDAP lookups - but each hook has to have intimate
knowledge of the environment it is contained in making them difficult to
be published - for example, as an open source component that others
could re-use. They may have hard coded LDAP bind passwords for example,
making them insecure to publish. It would be extremely nice if any open
source component writer could make use of these fields without having to
care where the values come from, and the configuration for where the
values come from could be centralized in one place - the Subversion server.
CLIENT DESIGN
Just an example. The "svn log" and "svn blame" commands could request the revision property named
"svn:author:display-name", and if that is returned then use it instead of "svn:author", otherwise use the value of
"svn:author". Further, a client-side configuration option could specify which property name should be used for these display purposes, so
for example some users in a particular team could choose to have the "author:nickname" revision property displayed instead of
"svn:author:display-name".
This would be great. I think many people like to see the format that GIT
uses: Display Name <email@domain>. This should be an option.
FURTHER SCOPE
Does a client need to be able to look up the information in other ways, such
as starting from svn:author rather than a revision number, or starting from an
extended author field?
I'm not clear on how "svn blame" is implemented. Presuming that it knows
what commit each line belongs to and that these are already being
queried (i.e. the implementation won't have to significantly change as a
result of this proposal), it is satisfactory for it to access the
information from the revision properties. I don't at the moment see a
requirement to be able to query a list of known users, or information
for a particular user. Subversion is not a directory service. The main
capability being provided is to enable Subversion clients to be ignorant
about how the server has been configured to perform authentication and
identification of users, but still be able to provide extended
information about Subversion metadata back to the user. Staying within
domain is probably smart as it can be a clear boundary around the scope
that is being agreed to.
Final thoughts on this draft:
The reference implementation should come with perhaps two server modules
to support this capability. One should be a caching LDAP implementation
that is fully configurable. One should be based on operating system
services (PAM or getent() for Unix?). Other implementations should be
possible, but left outside of core.
If the Subversion developers agree to some refinement of this proposal,
I understand that developers resources are limited and that there is no
guarantee that it would ever be implemented or if implemented that it
would ever be completed and distributed in core. I'm thinking that this
sort of project might be a good entry point for somebody such as myself
to contribute. Not sure about time right now - but if you put in the
effort to review and refine, then it would be only fair for me to at
least try to contribute.
Thanks for the time you put into this Julian.
--
Mark Mielke<m...@mielke.cc>