[ 
https://issues.apache.org/jira/browse/AVRO-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240167#comment-14240167
 ] 

Aaron Kimball commented on AVRO-1618:
-------------------------------------

Yes, that's correct. 

I think it depends what you mean about making the parsing process "real."

Converting a single JSON-encoded datum into a record is an (almost) trivial 
process; effectively:

{code}
class JsonDecoder:
  def decode(self, json_string):
    return json.loads(json_string)
{code}

... thus why I don't think such a method has been provided already :)

Converting a stream of concatenated json data strings into json objects via the 
{{DatumReader}} interface is a much bigger/harder patch to write for for three 
reasons:
* The trivial json decoder outlined above does not perform schema resolution; 
that is done (I think?) in the DatumReader layer.
* {{DatumReader}} and {{BinaryDecoder}} are specialized to one another; 
refactoring of the BinaryDecoder API and DatumReader implementation would be 
required. This code is not particularly well-known to me and would require some 
time to familiarize myself with it. DatumReader was not written to use a 
generic "Decoder" interface (e.g., the DatumReader specifically calls methods 
with names like {{decodeInt}} to establish the type of a union).
* Python's built in json library and {{simplejson}} don't seem particularly 
well-inclined toward a token-stream-based approach to JSON parsing; they seems 
to want to munch whole strings into complete output objects. I think we'd have 
to learn and depend on a new library like ijson 
(https://pypi.python.org/pypi/ijson/) to make this happen...

> Allow user to "clean up" unions into more conventional dicts in json encoding
> -----------------------------------------------------------------------------
>
>                 Key: AVRO-1618
>                 URL: https://issues.apache.org/jira/browse/AVRO-1618
>             Project: Avro
>          Issue Type: Improvement
>          Components: python
>    Affects Versions: 1.7.7
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: avro-1618.1.patch
>
>
> In Avro's JSON encoding, unions are implemented in a tagged fashion; walking 
> through this data structure is somewhat cumbersome. It would be good to have 
> a way of "decoding" this tagged-union data structure into a more conventional 
> dict where the union element is directly present without the tag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to