I think there is consensus that this should be implemented, see [AVRO-1582] 
Json serialization of nullable fileds and fields with default values 
improvement. - ASF JIRA

| 
| 
|  | 
[AVRO-1582] Json serialization of nullable fileds and fields with defaul...


 |

 |

 |



Here is a live example to get some sample data in avro json: 
https://demo.spf4j.org/example/records/1?_Accept=application/avro%2Bjsonand the 
"Natural" https://demo.spf4j.org/example/records/1?_Accept=application/json 
using the encoder suggested as implementation in the jira.
Somebody needs to find the time do the work to integrate this...
--Z



    On Monday, January 6, 2020, 12:36:44 PM EST, roger peppe 
<rogpe...@gmail.com> wrote:  
 
 Hi,

The JSON encoding in the specification includes an explicit type name for all 
kinds of object other than null. This means that a JSON-encoded Avro value with 
a union is very rarely directly compatible with normal JSON formats.
For example, it's very common for a JSON-encoded value to allow a value that's 
either null or string. In Avro, that's trivially expressed as the union type 
["null", "string"]. With conventional JSON, a string value "foo" would be 
encoded just as "foo", which is easily distinguished from null when decoding. 
However when using the Avro JSON format it must be encoded as {"string": "foo"}.
This means that Avro JSON-encoded values don't interchange easily with other 
JSON-encoded values.
AFAICS the main reason that the type name is always required in JSON-encoded 
unions is to avoid ambiguity. This particularly applies to record and map 
types, where it's not possible in general to tell which member of the union has 
been specified by looking at the data itself.
However, that reasoning doesn't apply if all the members of the union can be 
distinguished from their JSON token type.
I am considering using a JSON encoding that omits the type name when all the 
members of the union encode to distinct JSON token types (the JSON token types 
being: null, boolean, string, number, object and array).
For example, JSON-encoded values using the Avro schema ["null", "string", 
"int"] would encode as the literal values themselves (e.g. null, "foo", 999), 
but JSON-encoded values using the Avro schema ["int", "double"] would require 
the type name because the JSON lexeme doesn't distinguish between different 
kinds of number.
This would mean that it would be possible to represent a significant subset of 
"normal" JSON schemas with Avro. It seems to me that would potentially be very 
useful.
Thoughts? Is this a really bad idea to be contemplating? :)
  cheers,    rog.

  

Reply via email to