tl;dr

* OpenStack should use the Server, User-Agent and perhaps Via headers for 
debugging implementation issues. We may want to consider requiring a User-Agent 
header from clients.
* "Versions" in API discussions should mean backwards-incompatible changes in 
the API, as distinct from versions of the implementations.
* OpenStack should register distinct media types for each of the formats it 
uses, to make identification easy and to counteract format proliferation. 
* We as a community need to decide, document and stick to a plan for how 
clients determine the URIs that they use for OpenStack APIs.


The long version
----------------------

I've been thinking about versioning and discussing it with various folks, and 
have been encouraged to bring the discussion here.

My background is in IETF protocols, where there's sort of an unwritten (or, at 
least, poorly collected) approach to versioning and extensibility that's going 
to show through here. I won't say that we have to adopt that approach, but 
hopefully the advantages -- both on their own merits, as well as for the sake 
of aligning with the protocols we're using (HTTP, MIME; collectively "the Web") 
-- will be apparent.

Some will call it RESTful -- I tend to shy away from that term, because people 
often mean different things when they use it.

Rather than come up with a single, broad approach to versioning our APIs, I 
think it'd be useful to break it down into *what* we want to version -- in 
particular, what needs to change, and what utility we get out of surfacing that 
change in a way that's easy to see.

I can think of a few (this is by no means complete):

1) Implementations

>From a developer perspective, the most obvious changes are those to the 
>codebase. The primary use case here is obvious -- if I change the software on 
>one end, and something breaks, I need to be able to debug / trace the issue.

However, it's important to separate out this use of versioning from tracking 
change in the API. We already have several implementations of the various APIs 
(client and server-side), and bumping the URI, or a media type, or some other 
identifier every time somebody makes a commit / does a release doesn't make 
much sense, because changes can be compatible -- i.e., as long as certain rules 
are followed both in the APIs and the software that consumes them, they won't 
break anything. 

In HTTP, the right place to track this kind of change is with product tokens. 
These occur in the Server, User-Agent and Via headers, and contain multiple 
tokens if you have plugins (for example).

E.g.,
  Server: nova/1.4.2

This allows debugging, without tying the APIs and document formats we produce 
to specific releases. Note that this doesn't mean you should do API versioning 
(for example) using the User-Agent header; it's really only for debugging, 
tracing, gathering stats, and so forth.

Of course, we can put this information into responses too, if people want to 
persist it easily; the important thing is that this information identifies the 
implementation that generated the response, rather than the response format 
itself; that's separate...


* Document formats

Since we're using HTTP, one of the fundamental things that can change is the 
document formats that we exchange. Right now there are on the order of 10 main 
document types, but they're all referred to by generic "application/xml" and 
"application/json" media types. 

Registering these would give us ways to talk about specific formats -- e.g., 
"application/vnd.openstack-servers+xml", e.g., by typing links to them, by 
negotiating for them, etc.

The technical benefits of doing so are debatable, but my real motivation here 
is social -- avoiding format proliferation. Having lots and lots of document 
formats pushes complexity onto people who consume our APIs. Surfacing each 
format as a media type makes it clear what formats we need to work with, and 
makes us think before we mint a new one. It also makes it crystal-clear when 
you talk about a format; they become distinct from the URIs that are used to 
fetch them (more about that later).

Back to versioning -- making backwards-compatible changes to both JSON and XML 
formats is well-understood (if somewhat painful in the latter case). No such 
change should cause a change to the document format identifier (whether it be 
in the URI or in the media type); the only thing that should change that is a 
backwards-incompatible change, which should be very, very rare. 

This is because when we introduce a backwards-incompatible change to a format, 
we're breaking all clients who understand the existing format, so we're 
effectively creating a new format. I.e., "v2" is no more semantically 
significant than "bob" or "essex" -- we might as well just call it a new thing, 
because old clients can't do anything with it.

Again, naming our formats and giving them standalone identity helps reinforce 
this practice. Actually negotiating for such versioned formats can be through 
request headers, e.g.,

GET /servers
Accept: application/vnd.openstack.servers+json

or through URIs, if you need a stable bookmark to a specific format:

GET /servers.v1.json


* URIs

Finally, the client needs to know what URI to use for a particular request, and 
we need to manage change in them over time.

The way that many HTTP APIs do this is to introduce a version into the "root" 
of the URI tree which indicates the layout of all URIs under it; e.g.,

http://api.example.com/v1/foo

As I've explained in other threads, in this scheme a URI under "v1" has no 
relation to a URI under "v2", even if they have the same portion after the 
version prefix, because they're effectively two separate name spaces. The idea 
here is that a client "knows" the layout of URIs under "v1" and can make a 
request to, say /v1/servers/foo/interfaces with confidence, because the API is 
documented this way.

This is the direction OpenStack is currently going in; our API documentation as 
well as our WADL files have a number of URIs baked into them.

Let's call this the "classic" approach. Roy Fielding has complained that this 
approach isn't RESTful -- see 
<http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven> for a 
detailed explanation.  

Personally, I'm only interested in RESTfulness only so far as it can improve 
the utility that we get out of the API -- with the proviso that a number of the 
benefits of a "HTTP-Friendly" approach are latent, and only found later when 
you need them.

I think this *might* be one of those cases, and I'd like to open a discussion.

A hypertext-driven API would require that the client discover the URIs it's to 
use dynamically, at runtime. For example, it could open a session by making a 
request for a "catalogue" resource that links to interesting things:

GET / HTTP/1.1
Host: api.example.com
Accept: application/vnd.openstack-catalogue.json

getting back:

HTTP/1.1 200 OK
Content-Type: application/vnd.openstack-catalogue.json
{
  "links": {
    "openstack-server-list": "/servers/",
    "openstack-server-detail": "/servers/{server_id}",
    "openstack-images": "images/{image_id}",
    "openstack-users": "users/{user_id}"
  }
}

(please excuse my pidgin JSON)

The idea here is that each link is typed using a link relation (see RFC5988 - 
<http://tools.ietf.org/html/rfc5988>) that identifies its semantics -- 
including what it means to POST to it, what you GET back, etc. The URIs and URI 
templates (see <http://tools.ietf.org/html/draft-gregorio-uritemplate-07>) 
returned are entirely specific to the implementation, which is free to arrange 
its URI space as it sees fit. 

This means that there isn't really any need for a version in the URI; the 
client discovers the resources that it wants to interact with when it looks at 
the "catalogue" resource, which is conceptually similar to the "root" resource 
on a "normal" Web server.

There are some downsides to this approach; it places a certain amount of 
responsibility on the client; they can encounter broken URIs if they may 
assumptions about the layout of the server, so they really need to pay 
attention to the catalogue, which they'll need to download. Practically 
speaking, they'll need to have client-side caching to make its overhead 
manageable when performing multiple actions. It'll also require some careful 
thinking about server-side APIs for managing extensions, and likely some new 
approaches to testing.

Until recently, I was convinced that this was too much cost without enough 
benefit. However, one thing has made me reconsider -- extensibility.

OpenStack has a pretty complex extensibility story, because people can deploy 
extensions for pretty much anything. Carving off sections of the URI name space 
for the "classic" approach is tricky; if we all extensions *anywhere*, consider 
what happens when Rackspace has the "rax" prefix, and there's an interface that 
looks something like:

  http://api.example.com/v1/users/{user_id}

What if Rackspace wants to add an extension resource under "users"? All of the 
sudden another cloud deployment might have a conflict with any user ID that 
begins with "rax". Ew.

Using a hypertext-based approach, such conflicts aren't a concern, because the 
server is in complete control of its name space, and can rearrange things to 
make them work. Versioning *and* extension considerations are pushed out of 
URIs and into the link relations; if Rackspace wants to add an extension to the 
user resource (that doesn't fit into the existing format), the catalogue would 
look something like this:

HTTP/1.1 200 OK
Content-Type: application/vnd.openstack-catalogue.json
{
  "links": {
    "openstack-server-list": "/servers/",
    "openstack-server-detail": "/servers/{server_id}",
    "openstack-images": "/images/{image_id}",
    "openstack-users": "/users/{user_id}",
    "rax-user-addon": "/user-addon/{user_id}"
  }
}

I like this approach, because we're not overloading URIs (which already do a 
lot of heavy lifting) with the concerns of versioning and extensibility -- the 
link types take care of this really effectively. 

That's not to say I'm married to it; I just want to put it out there for 
consideration. The really important thing, to me, is that we very carefully 
document what our expectations for API clients are; if they can trust that a 
URI under "/v1/" will never change in a backwards-incompatible fashion, that's 
great; if we document that clients need to have a fresh copy of a catalogue to 
understand what URIs they can use, that's great too. That said, I do think we 
need to figure out where our current approach is taking us.

What do people think of the linked approach to versioning and extensibility?

Cheers,

P.S. If you want to read more along these lines (doubtful ;) or don't quite get 
it (very possible), see also 
<http://www.mnot.net/blog/2011/10/25/web_api_versioning_smackdown>.

--
Mark Nottingham   http://www.mnot.net/




_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Reply via email to