[
https://issues.apache.org/jira/browse/HIVE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sushanth Sowmyan updated HIVE-6166:
-----------------------------------
Attachment: HIVE-6166.patch
Patch attached.
> JsonSerDe is too strict about table schema
> ------------------------------------------
>
> Key: HIVE-6166
> URL: https://issues.apache.org/jira/browse/HIVE-6166
> Project: Hive
> Issue Type: Bug
> Components: HCatalog, Serializers/Deserializers
> Affects Versions: 0.12.0
> Reporter: Sushanth Sowmyan
> Assignee: Sushanth Sowmyan
> Attachments: HIVE-6166.patch
>
>
> JsonSerDe is too strict when it comes to schema, erroring out if it finds a
> subfield with a key name that does not map to an appropriate type/schema of a
> table, or an inner-struct schema.
> Thus, if a schema specifies "s:struct<a:int,b:string>,k:int" and we pass it
> data that looks like the following:
> {noformat}
> { "x" : "abc" , "s" : { "a" : 2 , "b" : "blah", "c": "woo" } }
> {noformat}
> This should still pass, and the record should be read as if it were
> {noformat}
> { "s" : { "a" : 2 , "b" : "blah"}, k : null }
> {noformat}
> This will allow the JsonSerDe to be used with a wider set of data where the
> data does not map too finely to the declared table schema.
> Note, we are still strict about a couple of things:
> a) If there is a declared schema column, then the type cannot vary, that is
> still considered an error. i.e., if the hive table schema says k1 is a
> boolean, it cannot magically change into an int or a struct, say, for eg.
> b) The JsonSerDe still attempts to map hive internal column names - i.e. if
> the data contains a column named "_col2", then, if "_col2" is not declared
> directly in the schema, it will map to column position 2 in that
> schema/subschema, rather than ignoring the field. This is so that tables
> created with CTAS will still work.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)