[ https://issues.apache.org/jira/browse/HIVE-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252649#comment-13252649 ]
Travis Crawford commented on HIVE-2941: --------------------------------------- Here are some additional details about the issue. Consider the following create table statement. Columns will be discovered for the table by reflecting on the {{Person}} object (instead of explicitly specifying them). {code} hive> create external table travis_test.person_test > partitioned by (part_dt string) > row format serde "com.twitter.elephantbird.hive.serde.ThriftSerDe" > with serdeproperties ("serialization.class"="com.twitter.elephantbird.examples.thrift.Person") > stored as > inputformat "com.twitter.elephantbird.mapred.input.HiveMultiInputFormat" > outputformat "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"; {code} Current behavior does not expand nested structures, listing the class name of nested structs as the field type. Users browsing the schema do not get a full definition of the table schema. {code} hive> describe extended person_test; OK name com.twitter.elephantbird.examples.thrift.Name from deserializer id int from deserializer email string from deserializer phones array<com.twitter.elephantbird.examples.thrift.PhoneNumber> from deserializer part_dt string {code} This patch expands nested structures, showing the full table schema. Here's an example of what the table looks like with the patch: {code} hive> describe extended person_test; OK name struct<first_name:string,last_name:string> from deserializer id int from deserializer email string from deserializer phones array<struct<number:string,type:struct<value:int>>> from deserializer part_dt string {code} In both cases, the table storage descriptor is unchanged - both list the columns as {{cols:[]}}. I believe the reflected table schema should be copied into the partition storage descriptor when adding a new partition, but that could be a separate change. > Hive should expand nested structs when setting the table schema from thrift > structs > ----------------------------------------------------------------------------------- > > Key: HIVE-2941 > URL: https://issues.apache.org/jira/browse/HIVE-2941 > Project: Hive > Issue Type: Bug > Reporter: Travis Crawford > Assignee: Travis Crawford > Attachments: HIVE-2941.D2721.1.patch > > > When setting a table serde, the deserializer is queried for its schema, which > is used to set the metastore table schema. The current implementation uses > the class name stored in the field as the field type. > By storing the class name as the field type, users cannot see the contents of > a struct with "describe tblname". Applications that query HiveMetaStore for > the table schema (specifically HCatalog in this case) see an unknown field > type, rather than a struct containing known field types. > Hive should store the expanded schema in the metastore so users browsing the > schema see expanded fields, and applications querying metastore see familiar > types. > DETAILS > Set the table serde to something like this. This serde uses the built-in > {{ThriftStructObjectInspector}}. > {code} > alter table foo_test > set serde "com.twitter.elephantbird.hive.serde.ThriftSerDe" > with serdeproperties ("serialization.class"="com.foo.Foo"); > {code} > This causes a call to {{MetaStoreUtils.getFieldsFromDeserializer}} which > returns a list of fields and their schemas. However, currently it does not > handle nested structs, and if {{com.foo.Foo}} above contains a field > {{com.foo.Bar}}, the class name {{com.foo.Bar}} would appear as the field > type. Instead, nested structs should be expanded. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira