Sushanth Sowmyan created HIVE-6166:
--------------------------------------

             Summary: JsonSerDe is too strict about table schema
                 Key: HIVE-6166
                 URL: https://issues.apache.org/jira/browse/HIVE-6166
             Project: Hive
          Issue Type: Bug
          Components: HCatalog, Serializers/Deserializers
    Affects Versions: 0.12.0
            Reporter: Sushanth Sowmyan
            Assignee: Sushanth Sowmyan


JsonSerDe is too strict when it comes to schema, erroring out if it finds a 
subfield with a key name that does not map to an appropriate type/schema of a 
table, or an inner-struct schema.

Thus, if a schema specifies "s:struct<a:int,b:string>,k:int" and we pass it 
data that looks like the following:

{noformat}
{ "x" : "abc" , "s" : { "a" : 2 , "b" : "blah", "c": "woo" } }
{noformat}

This should still pass, and the record should be read as if it were 

{noformat}
{ "s" : { "a" : 2 , "b" : "blah"}, k :  null }
{noformat}

This will allow the JsonSerDe to be used with a wider set of data where the 
data does not map too finely to the declared table schema.

Note, we are still strict about a couple of things:

a) If there is a declared schema column, then the type cannot vary, that is 
still considered an error. i.e., if the hive table schema says k1 is a boolean, 
it cannot magically change into an int or a struct, say, for eg.
b) The JsonSerDe still attempts to map hive internal column names - i.e. if the 
data contains a column named "_col2", then, if "_col2" is not declared directly 
in the schema, it will map to column position 2 in that schema/subschema, 
rather than ignoring the field. This is so that tables created with CTAS will 
still work. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to