Joseph Yen created HIVE-18176:
---------------------------------

             Summary: The response of GetResultSetMetadata is inconsistent with 
TCLIService.thrift for complex types
                 Key: HIVE-18176
                 URL: https://issues.apache.org/jira/browse/HIVE-18176
             Project: Hive
          Issue Type: Bug
          Components: HiveServer2
         Environment: HDP Hive 1.2.1
PyHive master(commit b68e1a8dcc9917feb10281af70ff6bd29c764cdd)
            Reporter: Joseph Yen


I was trying to add decimal, timestamp, date, array, map type support to PyHive 
DBAPI. In order to parse the result set correctly, I have to know the result 
set schema for each SELECT. For simple types(integer, string, timestamp, 
decimal, …), it’s not a problem. I can get all information by calling 
HiveServer2.GetResultSetMetadata. But for complex types(array, map, struct), 
the nested type information is missing. I can’t find a way to know if it’s an 
integer array or a string array from the response of GetResultSetMetadata.

According to 
[TCLIService.thrift|https://github.com/apache/hive/blob/release-1.2.1/service/if/TCLIService.thrift#L147-L188]
, recursively defined types such as {{array<int>}}, {{map<int, string>}} should 
be described by {{TTypeEntry.arrayEntry}}, {{TTypeEntry.mapEntry}} rather than 
{{TTypeEntry.primitivyEntry}} in the first element of {{TypeDesc.types}}. The 
nested types should be reside in {{TypeDesc.types}} as following elements, and 
be pointed from the first element.

However, when I actually called {{GetResultSetMetadata}} for the query {{SELECT 
array(1, 2, 3)}}, I got just a single {{TTypeEntry.primitivyEntry}} element in 
{{TypeDesc.types}} with {{TPrimitiveTypeEntry.type = ARRAY_TYPE}} 

This response violated both the descriptions in TCLIService.thrift —
bq. [“TTypeDesc employs a type list that maps integer “pointers” to TTypeEntry 
objects”|https://github.com/apache/hive/blob/release-1.2.1/service/if/TCLIService.thrift#L147-L188]
 
and 
bq. [“The primitive type token. This must satisfy the condition that type is in 
the PRIMITIVE_TYPES 
set.”|https://github.com/apache/hive/blob/release-1.2.1/service/if/TCLIService.thrift#L210-L215]

----
I tried the following script.

{code:sql}
create temporary table dummy(a int);
insert into table dummy values (1), (2), (3);
create temporary table tt(a int,  b string, c map<INT, ARRAY<string>>);
insert into table tt select 1, 'a', map(3, array('a','b','c')) from dummy limit 
1;
select * from tt;
{code}

And called {{GetResultSetMetadata}} right after executing the SELECT query.
The value of {{response.schema.columns}} was

{code:javascript}
[TColumnDesc(columnName='tt.a', typeDesc=TTypeDesc(
  types=[
    TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=3, typeQualifiers=None), 
arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, 
userDefinedTypeEntry=None)]), position=1, comment=None),
 TColumnDesc(columnName='tt.b', typeDesc=TTypeDesc(types=[
    TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=7, typeQualifiers=None), 
arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, 
userDefinedTypeEntry=None)]), position=2, comment=None),
 TColumnDesc(columnName='tt.c', typeDesc=TTypeDesc(types=[
    TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=11, 
typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, 
unionEntry=None, userDefinedTypeEntry=None)]), position=3, comment=None)]
{code}

However, according to the thrift file, it should be
{code:javascript}
[TColumnDesc(columnName='tt.a', typeDesc=TTypeDesc(types=[
  TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=3, typeQualifiers=None), 
arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, 
userDefinedTypeEntry=None)]), position=1, comment=None),
 TColumnDesc(columnName='tt.b', typeDesc=TTypeDesc(types=[
  TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=7, typeQualifiers=None), 
arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, 
userDefinedTypeEntry=None)]), position=2, comment=None),
 TColumnDesc(columnName='tt.c', typeDesc=TTypeDesc(types=[
  TTypeEntry(primitiveEntry=None, arrayEntry=None, 
mapEntry=TMapTypeEntry(keyTypePtr=1, valueTypePtr=2), structEntry=None, 
unionEntry=None, userDefinedTypeEntry=None),
  TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=3, typeQualifiers=None), 
arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, 
userDefinedTypeEntry=None),
  TTypeEntry(primitiveEntry=None, arrayEntry=TArrayTypeEntry(objectTypePtr=3), 
mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None),
  TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=7, typeQualifiers=None), 
arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, 
userDefinedTypeEntry=None)
]), position=3, comment=None)]
{code}

----
I found the related function in hive codebase.
https://github.com/apache/hive/blob/release-1.2.1/service/src/java/org/apache/hive/service/cli/TypeDescriptor.java#L66-L76
It seems that this function always put {{TPrimitiveTypeEntry}} to 
{{TTypeDesc.type}}, even for complex types(like array and map) which is 
inconsistent with the thirft file.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to