See comments in-line bellow: > On Jan 15, 2020, at 3:42 AM, roger peppe <rogpe...@gmail.com> wrote: > > Oops, I left arrays out! Two other thoughts: > > I wonder if it might be worth hedging bets about logical types. It would be > nice if (for example) a `timestamp-micros` value could be encoded as an > RFC3339 string, so perhaps that should be allowed for, but maybe that's a > step too far. I think logical types should should stay above the encoding/decoding… With timestamp-micros we could extend it to make it applicable to string and implement the converters, and then in json you would have something readable, but you would then have the same in binary and pay the readability cost there as well. I implemented special handling for decimal logical type in my encoder/decoder, but the best implementation I could do still feels like a hack...
> I wonder if there should be some indication of version so that you know which > JSON encoding version you're reading. Perhaps the Avro schema could include a > version field (maybe as part of a definition) so you know which version of > the spec to use when encoding/decoding. Then bet-hedging wouldn't be quite as > important. I think Schema needs to stay decoupled from the encoding. The same schema can be encoded in various ways (I have a csv encoder/decoder for example, https://demo.spf4j.org/example/records?_Accept=text/csv <https://demo.spf4j.org/example/records?_Accept=text/csv> ). I think the right abstraction for what you are looking for is the Media Type(https://en.wikipedia.org/wiki/Media_type <https://en.wikipedia.org/wiki/Media_type> ), It would be helpful to “standardize” the media types for the avro encodings: Here is what I mean, (with some examples where the same schema is served with different encodings): 1) Binary: “application/avro” https://demo.spf4j.org/example/records?_Accept=application/avro <https://demo.spf4j.org/example/records?_Accept=application/avro> 2) Current Json: “application/avro+json" https://demo.spf4j.org/example/records?_Accept=application/avro-x%2Bjson <https://demo.spf4j.org/example/records?_Accept=application/avro+json> 3) New Json: “application/avro-x+json” ? https://demo.spf4j.org/example/records?_Accept=application/avro-x%2Bjson <https://demo.spf4j.org/example/records?_Accept=application/avro+json> The media type including the avro schema (like you can see in the response ContentType in the headers above) can provide complete type information to be able to read a avro object from a byte stream. application/avro-x+json;avsc="{\"type\":\"array\",\"items\":{\"$ref\":\"org.spf4j.demo:jaxrs-spf4j-demo-schema:0.8:b\"}}” In HTTP context this fits well with content negotiation, and a client can ask for a previous version like: https://demo.spf4j.org/example/records/1?_Accept=application/json;avsc=%22{\%22$ref\%22:\%22org.spf4j.demo:jaxrs-spf4j-demo-schema:0.4:b\%22}%22 <https://demo.spf4j.org/example/records/1?_Accept=application/json;avsc=%22%7B%5C%22$ref%5C%22:%5C%22org.spf4j.demo:jaxrs-spf4j-demo-schema:0.4:b%5C%22%7D%22> Note on $ref, it is an extension to avsc I use to reference schemas from maven repos. (see https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroReferences <https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroReferences> if interested in more detail) The google protobuf world does not seems to be in better shape on this front: https://stackoverflow.com/questions/30505408/what-is-the-correct-protobuf-content-type <https://stackoverflow.com/questions/30505408/what-is-the-correct-protobuf-content-type> let me know if you have any questions... > > JSON Encoding > > Except for unions, the JSON encoding is the same as is used to encode field > default values. > The value of a union is encoded in JSON as follows: > if all values of the union can be distinguished unambiguously (see below), > the JSON encoding is the same as is used to encode field default values for > the type > otherwise it is encoded as a JSON object with one name/value pair whose name > is the type's name and whose value is the recursively encoded value. For > Avro's named types (record, fixed or enum) the user-specified name is used, > for other types the type name is used. > Unambiguity is defined as follows: > > An Avro value can be encoded as one of a set of JSON types: > null encodes as {null} > boolean encodes as {boolean} > int encodes as {number} > long encodes as {number} > float encodes as {number, string} > double encodes as {number, string} > bytes encodes as {string} > string encodes as {string} > any enum type encodes as {string} > any array type encodes as {array} > any map type encodes as {object} > any record type encodes as {object} > A union is considered unambiguous if the JSON type sets for all the members > of the union form mutually disjoint sets. > > Note that float and double are considered ambiguous with respect to string > because in the future, Avro might support encoding NaN and infinity values as > strings. LGTM, lets but this in a PR that covers the spec only. > > On Tue, 14 Jan 2020 at 21:57, roger peppe <rogpe...@gmail.com > <mailto:rogpe...@gmail.com>> wrote: > On Tue, 14 Jan 2020 at 19:26, Zoltan Farkas <zolyfar...@yahoo.com > <mailto:zolyfar...@yahoo.com>> wrote: > Makes sense, > > We have to agree on he scope of this implementation. > > Right now the implementation I have in java, handles only the: > > union {null, [some type]} situation. > > Are we ok with this for a start? > > I'm not sure that it's worth publishing a half-way solution, as if people > start using it and a fuller solution is implemented, there will be three > incompatible standards, which isn't ideal. > > What I see more, is to handle: > > 1) union {string, double}, (although we have to specify behavior for NAN, > Positive and negative infinity); union {string, boolean}; …. > > My thought, as mentioned at the beginning of this thread, is to omit the > wrapping when all the members of the union encode to distinct JSON token > types (the JSON token types being: null, boolean, string, number, object and > array). > > I think that we could probably leave out explicit mention of NaN and > infinity, as that's an issue with schemas too, and there's no obviously good > solution. That said, if we did want to solve the issue of NaN and infinity in > the future, things might get awkward with respect to this thread's proposal, > because it's likely that the only reasonable way to solve that issue is to > encode NaN and infinity as "NaN" and "±Infinity", which means that the union > ["string", "float"] becomes ambiguous if we leave out the type name for that > case. > > It seems that it's not unheard-of to a string representation for these float > values (see https://issues.apache.org/jira/browse/AVRO-1290 > <https://issues.apache.org/jira/browse/AVRO-1290>). > > So perhaps we could define the format something like this: > > JSON Encoding > > Except for unions, the JSON encoding is the same as is used to encode field > default values. > The value of a union is encoded in JSON as follows: > if all values of the union can be distinguished unambiguously (see below), > the JSON encoding is the same as is used to encode field default values for > the type > otherwise it is encoded as a JSON object with one name/value pair whose name > is the type's name and whose value is the recursively encoded value. For > Avro's named types (record, fixed or enum) the user-specified name is used, > for other types the type name is used. > Unambiguity is defined as follows: > > An Avro value can be encoded as one of a set of JSON types: > null encodes as {null} > boolean encodes as {boolean} > int encodes as {number} > long encodes as {number} > float encodes as {number, string} > double encodes as {number, string} > bytes encodes as {string} > string encodes as {string} > any enum encodes as {string} > any map encodes as {object} > any record encodes as {object} > A union is considered unambiguous if the JSON type sets for all the members > of the union form mutually disjoint sets. > > Note that float and double are considered ambiguous with respect to string > because in the future, Avro might support encoding NaN and infinity values as > strings. > > WDYT? > > 2) Make decimal an avro first class type. Current logical type approach is > not natural in JSON. (see https://issues.apache.org/jira/browse/AVRO-2164 > <https://issues.apache.org/jira/browse/AVRO-2164>). > > For 1.9.x 2) is probably a non-starter > > Yes, this sounds a bit out of scope to me. It would be nice if decimal values > were represented as a human-readable decimal number (possibly a JSON string > to survive round-trips), but that should perhaps be part of a larger change > to improve decimal support in general. Interestingly, if we were to be able > to represent decimal values as JSON numbers (for example when they're > unambiguously representable as such), that would fit fine with the above > description, because bytes would be considered ambiguous with respect to > float. > > cheers, > rog.