xsys created HIVE-26531:
---------------------------

             Summary: UnsupportedOperationException while creating table in 
Avro format if column schema contains MAP with INTEGER key
                 Key: HIVE-26531
                 URL: https://issues.apache.org/jira/browse/HIVE-26531
             Project: Hive
          Issue Type: Bug
          Components: Serializers/Deserializers
    Affects Versions: 3.1.2
            Reporter: xsys
         Attachments: Avro_Map_StackTrace.txt

h3. Describe the bug

We are trying to save a table with the {{Avro}} data format through 
{{{}spark-sql{}}}. The table contains {{MAP}} as part of the schema and the 
map's key is an {{{}INT{}}}: {{{}MAP<INT, STRING>{}}}. We observe the following 
exception from the {{CREATE TABLE}} query:

 
{noformat}
22/08/29 12:03:38 ERROR Table: Unable to get field from serde: 
org.apache.hadoop.hive.serde2.avro.AvroSerDe 
java.lang.UnsupportedOperationException: Key of Map can only be a 
String{noformat}
 

 

 

 

 

_Here is the full stack trace, for reference:_ [^Avro_Map_StackTrace.txt]

The exception is raised by the following [Hive 
code|https://github.com/apache/hive/blob/rel/release-3.1.2/serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java#L216-L221]:
{noformat}
  private Schema createAvroMap(TypeInfo typeInfo) {
    TypeInfo keyTypeInfo = ((MapTypeInfo) typeInfo).getMapKeyTypeInfo();
    if (((PrimitiveTypeInfo) keyTypeInfo).getPrimitiveCategory()
        != PrimitiveObjectInspector.PrimitiveCategory.STRING) {
      throw new UnsupportedOperationException("Key of Map can only be a 
String");
    }

    TypeInfo valueTypeInfo = ((MapTypeInfo) typeInfo).getMapValueTypeInfo();
    Schema valueSchema = createAvroSchema(valueTypeInfo);

    return Schema.createMap(valueSchema);
  }{noformat}
 
h3. To Reproduce

On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{spark-shell}} with the Avro 
package:
{noformat}
$SPARK_HOME/bin/spark-sql --packages org.apache.spark:spark-avro_2.12:3.2.1
{noformat}
Execute the following:
{noformat}
create table avro_map(c1 MAP<INT, STRING>) ROW FORMAT SERDE 
"org.apache.hadoop.hive.serde2.avro.AvroSerDe" STORED AS INPUTFORMAT 
"org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat" OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat";{noformat}
h3. Expected behavior

We expect to create a table successfully in Avro format if the column schema 
contains MAP with INTEGER key. We tried other formats like Parquet & ORC and 
the outcome is consistent with this expectation.

Here is a simplified example showing expected behavior using the Parquet & ORC 
file formats:
{noformat}
spark-sql> create table orc_map(c1 MAP<INT, STRING>) STORED AS ORC;
Time taken: 0.196 seconds
spark-sql> create table parquet_map(c1 MAP<INT, STRING>) STORED AS PARQUET;
Time taken: 0.113 seconds
spark-sql> desc orc_map;
c1                      map<int,string>
Time taken: 0.387 seconds, Fetched 1 row(s)
spark-sql> desc parquet_map;
c1                      map<int,string>
Time taken: 0.077 seconds, Fetched 1 row(s){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to