[AVRO-488] Invalid HTTP Content-Type - ASF JIRA (apache.org)<https://issues.apache.org/jira/browse/AVRO-488>
Hi everyone, AVRO-488 is a long-standing issue that requires a standards conformant resolution. My team is going to expand our use of Avro and we really need a valid media type for it. (You might like Avrotize<https://github.com/clemensv/avrotize>) "avro/binary" is a bug per RFC6838<https://www.rfc-editor.org/rfc/rfc6838#section-4.2> and while appreciate the endless patience in trying to sit that problem out, it's still a bug. The hurdle to cross for an "application/avro" registration is to get past the review process laid out in RFC6838 Section 3.1<https://www.rfc-editor.org/rfc/rfc6838.html#section-3.1>. I believe that this Media Type would indeed be in the best interest of the Internet Community since Avro is so widely used. The Avro PMC would have to work on that with IANA<https://www.iana.org/form/media-types> and I think it would be a great benefit to the ecosystem to have this formalized. Yet, since there is no formal specification of Avro in W3C or IETF or another standards body recognized by IESG, I think that chances are relatively slim to get that past IANA. In the "vendor tree<https://www.rfc-editor.org/rfc/rfc6838#section-3.2>", there are few hurdles except for announcing the registration and given that there already ASF registrations for Thrift and Parquet<https://www.iana.org/assignments/media-types/media-types.xhtml>, I propose that the PMC register the following three media types in the "application" registry: * Apache Avro Binary encoding: vnd.apache.avro+avro --> application/vnd.apache.avro+avro * Apache Avro Binary encoding: vnd.apache.avro+avro-container --> application/vnd.apache.avro+avro-container * Apache Avro JSON encoding: vnd.apache.avro+json --> application/vnd.apache.avro+json For the JSON encoding, the "+json" structured syntax suffix leans on the definition in RFC6839 Section 3.1<https://www.rfc-editor.org/rfc/rfc6839#section-3.1> and generally helps systems to determine whether some content is JSON if the media-type is not application/json outright. The special handling semantics of unions in the Avro JSON encoding compared to the naïve use of classes emitted by the Avro code generators with any other JSON serialization framework does justify the distinction of Avro's JSON flavor, in my view. With that, it is useful to clearly delineate the binary encodings from the json encoding to limit confusion. That is how I am landing on "avro+avro" for the schemaless wire format vs. just "avro". The "+avro" signals the use of schemaless Avro binary encoding. The "+avro-container" is the object container file format. Now, how do we get to the "+avro" and "+avro-container" suffixes? There is precedent in the IANA "Structured Syntax Suffixes" registry<https://www.iana.org/assignments/media-type-structured-suffix/media-type-structured-suffix.xhtml> where +tlv and +sqlite3 and +wbxml have been registered for "vnd.xxx" media types. That means the suffixes would be registered along with the "vnd.apache.avro+x" media types and I don't see grounds for rejection given the precedent. For the vnd.apache.avro+json and vnd.apache.avro+avro media types, I would then also define an optional parameter "schema" that carries a schema URI-reference (an identifier, not necessarily locator) which identifies the schema. There are several schema registries around that help with sharing Avro schema and pointing to a shared schema right in the content-type without having to stash that (vital) information into some OOB negotiated place is clearly beneficial. Content-Type: application/vnd.apache.avro+avro; schema=https://fabrikam.com/schemas/a1b2c3d4 or if the registry scope is implied for all parties: Content-Type: application/vnd.apache.avro+avro; schema=a1b2c3d4 The neat effect of having the suffix is that I could now clearly flag my own application's bespoke media types that leverage Avro binary: "application/vnd.contoso.crm+avro;schema=sales-lead" For the (corner-)cases where you need a bespoke media type and want to use Avro's specific JSON encoding with a schema, I propose combining the available suffixes (which is legal): "application/vnd.contoso.crm+avro+json;schema=sales-lead" Going the "vnd." route is not an unsurmountable hurdle. The other path, which we picked in the CNCF CloudEvents project<https://github.com/cloudevents/spec/blob/v1.0.2/cloudevents/formats/json-format.md#3-envelope>, is to let IANA be IANA and just start using "application/avro". All the above, including suffixes, would still apply then, but with a shorter media type name. If the PMC does not want to deal with this, I ask that you please resolve AVRO-488 by removing the mention of "avro/binary" in the documentation and elsewhere and leave the media-type choice to the implementers. I don't think that were in everybody's best interest, but neither is a definition that violates the basics of how media types work and blows up people's content-type parsers. Best Regards Clemens [cid:image001.jpg@01DA8CB6.6E8E3180]<http://www.microsoft.com/en-us/news/ImageDetail.aspx?id=4DABA54CBB4D25A9E9905BC59E4A6D44E33694EA> Clemens Vasters Messaging Platform Architect Microsoft Azure È+49 151 44063557 * cleme...@microsoft.com<mailto:cleme...@microsoft.com> European Microsoft Innovation Center GmbH | Gewürzmühlstrasse 11 | 80539 Munich| Germany Geschäftsführer/General Managers: Keith Dolliver, Benjamin O. Orndorff Amtsgericht Aachen, HRB 12066