Zoltan Ivanfi created AVRO-2128:
-----------------------------------
Summary: Schema parsing in the Java library is more permissive
than the C implementation or the JSON specification
Key: AVRO-2128
URL: https://issues.apache.org/jira/browse/AVRO-2128
Project: Avro
Issue Type: Bug
Reporter: Zoltan Ivanfi
When parsing schemas, the Java library accepts C-style comments (which are
forbidden in JSON) and is unaffected by trailing garbage (parsing stops as soon
as it reaches the end of the JSON structure).
In the C library, however, comments and trailing whitspaces cause an error.
If a schema is accepted by one language binding, it should be accepted by the
other as well. The schema should also be valid JSON. It's the Java library that
does not enforce this by being more permissive than it should be, so it seems
that the Java implementation should be changed. However, we must also consider
whether making the Java library stricter at this point would make any existing
data unreadable.
Fortunately, the schema that is written in the data files themselves is always
valid JSON, even if it is based on a non-JSON-conformant schema. The reason for
this is that Java library parses the schema, build an in-memory representation
and then reserializes that, thereby removing comments and trailing garbage. So
existing data files are not affected, only user-supplied schemas. These can be
manually updated (unlike existing data files).
The real-world use-case where this discrepancy causes problems is Hive-Impala
interaction. Users can create tables in Hive by supplying an Avro schema. That
schema will be associated with the whole table by getting saved in the Hive
metastore. Impala also consults this metadata when accessing the table and that
causes an error in the Avro C library that Impala uses. This is detailed in
IMPALA-1024. In particular, [this
comment|https://issues.apache.org/jira/browse/IMPALA-1024?focusedCommentId=16261702&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16261702]
contains a lot of relevant information.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)