Joseph Yen created HIVE-18176: --------------------------------- Summary: The response of GetResultSetMetadata is inconsistent with TCLIService.thrift for complex types Key: HIVE-18176 URL: https://issues.apache.org/jira/browse/HIVE-18176 Project: Hive Issue Type: Bug Components: HiveServer2 Environment: HDP Hive 1.2.1 PyHive master(commit b68e1a8dcc9917feb10281af70ff6bd29c764cdd) Reporter: Joseph Yen
I was trying to add decimal, timestamp, date, array, map type support to PyHive DBAPI. In order to parse the result set correctly, I have to know the result set schema for each SELECT. For simple types(integer, string, timestamp, decimal, …), it’s not a problem. I can get all information by calling HiveServer2.GetResultSetMetadata. But for complex types(array, map, struct), the nested type information is missing. I can’t find a way to know if it’s an integer array or a string array from the response of GetResultSetMetadata. According to [TCLIService.thrift|https://github.com/apache/hive/blob/release-1.2.1/service/if/TCLIService.thrift#L147-L188] , recursively defined types such as {{array<int>}}, {{map<int, string>}} should be described by {{TTypeEntry.arrayEntry}}, {{TTypeEntry.mapEntry}} rather than {{TTypeEntry.primitivyEntry}} in the first element of {{TypeDesc.types}}. The nested types should be reside in {{TypeDesc.types}} as following elements, and be pointed from the first element. However, when I actually called {{GetResultSetMetadata}} for the query {{SELECT array(1, 2, 3)}}, I got just a single {{TTypeEntry.primitivyEntry}} element in {{TypeDesc.types}} with {{TPrimitiveTypeEntry.type = ARRAY_TYPE}} This response violated both the descriptions in TCLIService.thrift — bq. [“TTypeDesc employs a type list that maps integer “pointers” to TTypeEntry objects”|https://github.com/apache/hive/blob/release-1.2.1/service/if/TCLIService.thrift#L147-L188] and bq. [“The primitive type token. This must satisfy the condition that type is in the PRIMITIVE_TYPES set.”|https://github.com/apache/hive/blob/release-1.2.1/service/if/TCLIService.thrift#L210-L215] ---- I tried the following script. {code:sql} create temporary table dummy(a int); insert into table dummy values (1), (2), (3); create temporary table tt(a int, b string, c map<INT, ARRAY<string>>); insert into table tt select 1, 'a', map(3, array('a','b','c')) from dummy limit 1; select * from tt; {code} And called {{GetResultSetMetadata}} right after executing the SELECT query. The value of {{response.schema.columns}} was {code:javascript} [TColumnDesc(columnName='tt.a', typeDesc=TTypeDesc( types=[ TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=3, typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]), position=1, comment=None), TColumnDesc(columnName='tt.b', typeDesc=TTypeDesc(types=[ TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=7, typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]), position=2, comment=None), TColumnDesc(columnName='tt.c', typeDesc=TTypeDesc(types=[ TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=11, typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]), position=3, comment=None)] {code} However, according to the thrift file, it should be {code:javascript} [TColumnDesc(columnName='tt.a', typeDesc=TTypeDesc(types=[ TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=3, typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]), position=1, comment=None), TColumnDesc(columnName='tt.b', typeDesc=TTypeDesc(types=[ TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=7, typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]), position=2, comment=None), TColumnDesc(columnName='tt.c', typeDesc=TTypeDesc(types=[ TTypeEntry(primitiveEntry=None, arrayEntry=None, mapEntry=TMapTypeEntry(keyTypePtr=1, valueTypePtr=2), structEntry=None, unionEntry=None, userDefinedTypeEntry=None), TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=3, typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None), TTypeEntry(primitiveEntry=None, arrayEntry=TArrayTypeEntry(objectTypePtr=3), mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None), TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=7, typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None) ]), position=3, comment=None)] {code} ---- I found the related function in hive codebase. https://github.com/apache/hive/blob/release-1.2.1/service/src/java/org/apache/hive/service/cli/TypeDescriptor.java#L66-L76 It seems that this function always put {{TPrimitiveTypeEntry}} to {{TTypeDesc.type}}, even for complex types(like array and map) which is inconsistent with the thirft file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)