[ 
https://issues.apache.org/jira/browse/HIVE-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252649#comment-13252649
 ] 

Travis Crawford commented on HIVE-2941:
---------------------------------------

Here are some additional details about the issue. Consider the following create 
table statement. Columns will be discovered for the table by reflecting on the 
{{Person}} object (instead of explicitly specifying them).

{code}
hive> create external table travis_test.person_test 
    >   partitioned by (part_dt string)
    >   row format serde "com.twitter.elephantbird.hive.serde.ThriftSerDe"
    >     with serdeproperties 
("serialization.class"="com.twitter.elephantbird.examples.thrift.Person")
    >   stored as
    >     inputformat 
"com.twitter.elephantbird.mapred.input.HiveMultiInputFormat"
    >     outputformat 
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
{code}

Current behavior does not expand nested structures, listing the class name of 
nested structs as the field type. Users browsing the schema do not get a full 
definition of the table schema.

{code}
hive> describe extended person_test;                                            
                        
OK
name    com.twitter.elephantbird.examples.thrift.Name   from deserializer
id      int     from deserializer
email   string  from deserializer
phones  array<com.twitter.elephantbird.examples.thrift.PhoneNumber>     from 
deserializer
part_dt string  
{code}

This patch expands nested structures, showing the full table schema. Here's an 
example of what the table looks like with the patch:

{code}
hive> describe extended person_test;
OK
name    struct<first_name:string,last_name:string>      from deserializer
id      int     from deserializer
email   string  from deserializer
phones  array<struct<number:string,type:struct<value:int>>>     from 
deserializer
part_dt string  
{code}

In both cases, the table storage descriptor is unchanged - both list the 
columns as {{cols:[]}}.

I believe the reflected table schema should be copied into the partition 
storage descriptor when adding a new partition, but that could be a separate 
change.
                
> Hive should expand nested structs when setting the table schema from thrift 
> structs
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2941
>                 URL: https://issues.apache.org/jira/browse/HIVE-2941
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Travis Crawford
>            Assignee: Travis Crawford
>         Attachments: HIVE-2941.D2721.1.patch
>
>
> When setting a table serde, the deserializer is queried for its schema, which 
> is used to set the metastore table schema. The current implementation uses 
> the class name stored in the field as the field type.
> By storing the class name as the field type, users cannot see the contents of 
> a struct with "describe tblname". Applications that query HiveMetaStore for 
> the table schema (specifically HCatalog in this case) see an unknown field 
> type, rather than a struct containing known field types.
> Hive should store the expanded schema in the metastore so users browsing the 
> schema see expanded fields, and applications querying metastore see familiar 
> types.
> DETAILS
> Set the table serde to something like this. This serde uses the built-in 
> {{ThriftStructObjectInspector}}.
> {code}
> alter table foo_test
>   set serde "com.twitter.elephantbird.hive.serde.ThriftSerDe"
>   with serdeproperties ("serialization.class"="com.foo.Foo");
> {code}
> This causes a call to {{MetaStoreUtils.getFieldsFromDeserializer}} which 
> returns a list of fields and their schemas. However, currently it does not 
> handle nested structs, and if {{com.foo.Foo}} above contains a field 
> {{com.foo.Bar}}, the class name {{com.foo.Bar}} would appear as the field 
> type. Instead, nested structs should be expanded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to