[AVRO-488] Invalid HTTP Content-Type - ASF JIRA 
(apache.org)<https://issues.apache.org/jira/browse/AVRO-488>

Hi everyone,

AVRO-488 is a long-standing issue that requires a standards conformant 
resolution.

My team is going to expand our use of Avro and we really need a valid media 
type for it. (You might like Avrotize<https://github.com/clemensv/avrotize>)

"avro/binary" is a bug per 
RFC6838<https://www.rfc-editor.org/rfc/rfc6838#section-4.2> and while 
appreciate the endless patience in trying to sit that problem out, it's still a 
bug.

The hurdle to cross for an "application/avro" registration is to get past the 
review process laid out in RFC6838 Section 
3.1<https://www.rfc-editor.org/rfc/rfc6838.html#section-3.1>. I believe that 
this Media Type would indeed be in the best interest of the Internet Community 
since Avro is so widely used. The Avro PMC would have to work on that with 
IANA<https://www.iana.org/form/media-types> and I think it would be a great 
benefit to the ecosystem to have this formalized. Yet, since there is no formal 
specification of Avro in W3C or IETF or another standards body recognized by 
IESG, I think that chances are relatively slim to get that past IANA.

In the "vendor tree<https://www.rfc-editor.org/rfc/rfc6838#section-3.2>", there 
are few hurdles except for announcing the registration and given that there 
already ASF registrations for Thrift and 
Parquet<https://www.iana.org/assignments/media-types/media-types.xhtml>, I 
propose that the PMC register the following three media types in the 
"application" registry:


  *   Apache Avro Binary encoding: vnd.apache.avro+avro --> 
application/vnd.apache.avro+avro
  *   Apache Avro Binary encoding: vnd.apache.avro+avro-container --> 
application/vnd.apache.avro+avro-container
  *   Apache Avro JSON encoding: vnd.apache.avro+json --> 
application/vnd.apache.avro+json

For the JSON encoding, the "+json" structured syntax suffix leans on the 
definition in RFC6839 Section 
3.1<https://www.rfc-editor.org/rfc/rfc6839#section-3.1> and generally helps 
systems to determine whether some content is JSON if the media-type is not 
application/json outright. The special handling semantics of unions in the Avro 
JSON encoding compared to the naïve use of classes emitted by the Avro code 
generators with any other JSON serialization framework does justify the 
distinction of Avro's JSON flavor, in my view.

With that, it is useful to clearly delineate the binary encodings from the json 
encoding to limit confusion. That is how I am landing on "avro+avro" for the 
schemaless wire format vs. just "avro". The "+avro" signals the use of 
schemaless Avro binary encoding. The "+avro-container" is the object container 
file format.

Now, how do we get to the "+avro" and "+avro-container" suffixes? There is 
precedent in the IANA "Structured Syntax Suffixes" 
registry<https://www.iana.org/assignments/media-type-structured-suffix/media-type-structured-suffix.xhtml>
 where +tlv and +sqlite3 and +wbxml have been registered for "vnd.xxx" media 
types. That means the suffixes would be registered along with the 
"vnd.apache.avro+x" media types and I don't see grounds for rejection given the 
precedent.

For the vnd.apache.avro+json and vnd.apache.avro+avro media types, I would then 
also define an optional parameter "schema" that carries a schema URI-reference 
(an identifier, not necessarily locator) which identifies the schema. There are 
several schema registries around that help with sharing Avro schema and 
pointing to a shared schema right in the content-type without having to stash 
that (vital) information into some OOB negotiated place is clearly beneficial.

Content-Type: application/vnd.apache.avro+avro; 
schema=https://fabrikam.com/schemas/a1b2c3d4

or if the registry scope is implied for all parties:

Content-Type: application/vnd.apache.avro+avro; schema=a1b2c3d4

The neat effect of having the suffix is that I could now clearly flag my own 
application's bespoke media types that leverage Avro binary: 
"application/vnd.contoso.crm+avro;schema=sales-lead"

For the (corner-)cases where you need a bespoke media type and want to use 
Avro's specific JSON encoding with a schema, I propose combining the available 
suffixes (which is legal): 
"application/vnd.contoso.crm+avro+json;schema=sales-lead"

Going the "vnd." route is not an unsurmountable hurdle. The other path, which 
we picked in the CNCF CloudEvents 
project<https://github.com/cloudevents/spec/blob/v1.0.2/cloudevents/formats/json-format.md#3-envelope>,
 is to let IANA be IANA and just start using "application/avro". All the above, 
including suffixes, would still apply then, but with a shorter media type name.

If the PMC does not want to deal with this, I ask that you please resolve 
AVRO-488 by removing the mention of "avro/binary" in the documentation and 
elsewhere and leave the media-type choice to the implementers. I don't think 
that were in everybody's best interest, but neither is a definition that 
violates the basics of how media types work and blows up people's content-type 
parsers.

Best Regards
Clemens

[cid:image001.jpg@01DA8CB6.6E8E3180]<http://www.microsoft.com/en-us/news/ImageDetail.aspx?id=4DABA54CBB4D25A9E9905BC59E4A6D44E33694EA>
Clemens Vasters
Messaging Platform Architect
Microsoft Azure
È+49 151 44063557
*  cleme...@microsoft.com<mailto:cleme...@microsoft.com>
European Microsoft Innovation Center GmbH | Gewürzmühlstrasse 11 | 80539 
Munich| Germany
Geschäftsführer/General Managers: Keith Dolliver, Benjamin O. Orndorff
Amtsgericht Aachen, HRB 12066


Reply via email to