[ 
https://issues.apache.org/jira/browse/HIVE-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HIVE-2171:
------------------------------

    Attachment: HIVE-2171.patch

Patch:
* Adds comment field to StructField interface and implements reasonable 
versions to each of its implementations.
* Adds overloaded versions of each of the struct-based ObjectInspector 
factories to allow the comments to be set.
* Adjusts MetastoreUtils to check if the comment of the field is null, if so, 
maintains previous behavior, else uses the comment.
* Adds new unit test for MetastoreUtils.  For this, mockito was added as a 
dependency.  Right now it looks like Hive's Ivy conf isn't set up to only 
include some jars in the package.  If this patch goes in, I'll open another 
jira to make sure the mockito and other test-related jars aren't included in 
jars they don't need to be.
* Refactors the TestStandardObjectInspectors test to test both with and without 
comments.

After this patch, a serde that wants to specify comments can and have them show 
up in the table description. For example, with a table kst created by an 
implementation of SerDe, that has an example for each type (the comments are 
all separate, they're all just boring: this is field BLAH) can now set the 
field comments:
{noformat}hive> describe kst;
OK
string1 string  this field is string1
string2 string  this field is string2
int1    int     this field is int1
boolean1        boolean this field is boolean1
long1   bigint  this field is long1
float1  float   this field is float1
double1 double  this field is double1
inner_record1   struct<int_in_inner_record1:int,string_in_inner_record1:string> 
this field is inner_record1
enum1   string  this field is enum1
array1  array<string>   this field is array1
map1    map<string,string>      this field is map1
union1  uniontype<float,boolean,string> this field is union1
fixed1  array<tinyint>  this field is fixed1
null1   void    this field is null1
unionnullint    int     this field is UnionNullInt
bytes1  array<tinyint>  this field is bytes1
ds      string
Time taken: 0.286 seconds{noformat}

One thing I noticed is that these field comments on structs should extended to 
substructures, and does with this new patch for custom serdes:
{noformat}hive> describe kst.inner_record1;
OK
int_in_inner_record1    int     this field is int_in_inner_record1
string_in_inner_record1 string  this field is string_in_inner_record1
Time taken: 0.113 seconds{noformat}

However, this doesn't work correctly with built-in serdes:

{noformat}hive> create table test_table(a STRUCT<z:string COMMENT 'comment for 
z',x:int> COMMENT 'comment for a');
OK
Time taken: 2.565 seconds
hive> describe test_table;
OK
a       struct<z:string,x:int>  comment for a
Time taken: 0.139 seconds
hive> describe test_table.a;
OK
z       string  from deserializer
x       int     from deserializer
Time taken: 0.096 seconds
hive> describe test_table.a.z;
OK
z       string  from deserializer
Time taken: 0.089 seconds
hive>{noformat}

The comment for field z is lost, replaced by the boilerplate text "from 
deserializer" and can't be retrieved from the CLI.  I'll open a JIRA for this.

This is my first Hive patch, so please check to see if I missed anything.

> Allow custom serdes to set field comments
> -----------------------------------------
>
>                 Key: HIVE-2171
>                 URL: https://issues.apache.org/jira/browse/HIVE-2171
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>             Fix For: 0.7.1
>
>         Attachments: HIVE-2171.patch
>
>
> Currently, while serde implementations can set a field's name, they can't set 
> its comment.  These are set in the metastore utils to {{(from 
> deserializer)}}.  For those serdes that can provide meaningful comments for a 
> field, they should be propagated to the table description.  These 
> serde-provided comments could be prepended to "(from deserializer)" if others 
> feel that's a meaningful distinction.  This change involves updating 
> {{StructField}} to support a (possibly null) comment field and then 
> propagating this change out to the myriad places {{StructField}} is thrown 
> around.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to