[ https://issues.apache.org/jira/browse/HIVE-26531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
xsys updated HIVE-26531: ------------------------ Description: h3. Describe the bug We are trying to save a table with the {{Avro}} data format through {{{}spark-sql{}}}. The table contains {{MAP}} as part of the schema and the map's key is an {{{}INT{}}}: {{{}MAP<INT, STRING>{}}}. We observe the following exception from the {{CREATE TABLE}} query: {noformat} 22/08/29 12:03:38 ERROR Table: Unable to get field from serde: org.apache.hadoop.hive.serde2.avro.AvroSerDe java.lang.UnsupportedOperationException: Key of Map can only be a String{noformat} _Here is the full stack trace, for reference:_ [^Avro_Map_StackTrace.txt] The exception is raised by the following [Hive code|https://github.com/apache/hive/blob/rel/release-3.1.2/serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java#L216-L221]: {noformat} private Schema createAvroMap(TypeInfo typeInfo) { TypeInfo keyTypeInfo = ((MapTypeInfo) typeInfo).getMapKeyTypeInfo(); if (((PrimitiveTypeInfo) keyTypeInfo).getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.STRING) { throw new UnsupportedOperationException("Key of Map can only be a String"); } TypeInfo valueTypeInfo = ((MapTypeInfo) typeInfo).getMapValueTypeInfo(); Schema valueSchema = createAvroSchema(valueTypeInfo); return Schema.createMap(valueSchema); }{noformat} h3. To Reproduce On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{spark-shell}} with the Avro package: {noformat} $SPARK_HOME/bin/spark-sql --packages org.apache.spark:spark-avro_2.12:3.2.1 {noformat} Execute the following: {noformat} create table avro_map(c1 MAP<INT, STRING>) ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.avro.AvroSerDe" STORED AS INPUTFORMAT "org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat" OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat";{noformat} h3. Expected behavior We expect to create a table successfully in Avro format if the column schema contains MAP with INTEGER key. We tried other formats like Parquet & ORC and the outcome is consistent with this expectation. Here is a simplified example showing expected behavior using the Parquet & ORC file formats: {noformat} spark-sql> create table orc_map(c1 MAP<INT, STRING>) STORED AS ORC; Time taken: 0.196 seconds spark-sql> create table parquet_map(c1 MAP<INT, STRING>) STORED AS PARQUET; Time taken: 0.113 seconds spark-sql> desc orc_map; c1 map<int,string> Time taken: 0.387 seconds, Fetched 1 row(s) spark-sql> desc parquet_map; c1 map<int,string> Time taken: 0.077 seconds, Fetched 1 row(s){noformat} was: h3. Describe the bug We are trying to save a table with the {{Avro}} data format through {{{}spark-sql{}}}. The table contains {{MAP}} as part of the schema and the map's key is an {{{}INT{}}}: {{{}MAP<INT, STRING>{}}}. We observe the following exception from the {{CREATE TABLE}} query: {noformat} 22/08/29 12:03:38 ERROR Table: Unable to get field from serde: org.apache.hadoop.hive.serde2.avro.AvroSerDe java.lang.UnsupportedOperationException: Key of Map can only be a String{noformat} _Here is the full stack trace, for reference:_ [^Avro_Map_StackTrace.txt] The exception is raised by the following [Hive code|https://github.com/apache/hive/blob/rel/release-3.1.2/serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java#L216-L221]: {noformat} private Schema createAvroMap(TypeInfo typeInfo) { TypeInfo keyTypeInfo = ((MapTypeInfo) typeInfo).getMapKeyTypeInfo(); if (((PrimitiveTypeInfo) keyTypeInfo).getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.STRING) { throw new UnsupportedOperationException("Key of Map can only be a String"); } TypeInfo valueTypeInfo = ((MapTypeInfo) typeInfo).getMapValueTypeInfo(); Schema valueSchema = createAvroSchema(valueTypeInfo); return Schema.createMap(valueSchema); }{noformat} h3. To Reproduce On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{spark-shell}} with the Avro package: {noformat} $SPARK_HOME/bin/spark-sql --packages org.apache.spark:spark-avro_2.12:3.2.1 {noformat} Execute the following: {noformat} create table avro_map(c1 MAP<INT, STRING>) ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.avro.AvroSerDe" STORED AS INPUTFORMAT "org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat" OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat";{noformat} h3. Expected behavior We expect to create a table successfully in Avro format if the column schema contains MAP with INTEGER key. We tried other formats like Parquet & ORC and the outcome is consistent with this expectation. Here is a simplified example showing expected behavior using the Parquet & ORC file formats: {noformat} spark-sql> create table orc_map(c1 MAP<INT, STRING>) STORED AS ORC; Time taken: 0.196 seconds spark-sql> create table parquet_map(c1 MAP<INT, STRING>) STORED AS PARQUET; Time taken: 0.113 seconds spark-sql> desc orc_map; c1 map<int,string> Time taken: 0.387 seconds, Fetched 1 row(s) spark-sql> desc parquet_map; c1 map<int,string> Time taken: 0.077 seconds, Fetched 1 row(s){noformat} > UnsupportedOperationException while creating table in Avro format if column > schema contains MAP with INTEGER key > ---------------------------------------------------------------------------------------------------------------- > > Key: HIVE-26531 > URL: https://issues.apache.org/jira/browse/HIVE-26531 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Affects Versions: 3.1.2 > Reporter: xsys > Priority: Major > Attachments: Avro_Map_StackTrace.txt > > > h3. Describe the bug > We are trying to save a table with the {{Avro}} data format through > {{{}spark-sql{}}}. The table contains {{MAP}} as part of the schema and the > map's key is an {{{}INT{}}}: {{{}MAP<INT, STRING>{}}}. We observe the > following exception from the {{CREATE TABLE}} query: > {noformat} > 22/08/29 12:03:38 ERROR Table: Unable to get field from serde: > org.apache.hadoop.hive.serde2.avro.AvroSerDe > java.lang.UnsupportedOperationException: Key of Map can only be a > String{noformat} > _Here is the full stack trace, for reference:_ [^Avro_Map_StackTrace.txt] > The exception is raised by the following [Hive > code|https://github.com/apache/hive/blob/rel/release-3.1.2/serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java#L216-L221]: > {noformat} > private Schema createAvroMap(TypeInfo typeInfo) { > TypeInfo keyTypeInfo = ((MapTypeInfo) typeInfo).getMapKeyTypeInfo(); > if (((PrimitiveTypeInfo) keyTypeInfo).getPrimitiveCategory() > != PrimitiveObjectInspector.PrimitiveCategory.STRING) { > throw new UnsupportedOperationException("Key of Map can only be a > String"); > } > TypeInfo valueTypeInfo = ((MapTypeInfo) typeInfo).getMapValueTypeInfo(); > Schema valueSchema = createAvroSchema(valueTypeInfo); > return Schema.createMap(valueSchema); > }{noformat} > > h3. To Reproduce > On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{spark-shell}} with the > Avro package: > {noformat} > $SPARK_HOME/bin/spark-sql --packages org.apache.spark:spark-avro_2.12:3.2.1 > {noformat} > Execute the following: > {noformat} > create table avro_map(c1 MAP<INT, STRING>) ROW FORMAT SERDE > "org.apache.hadoop.hive.serde2.avro.AvroSerDe" STORED AS INPUTFORMAT > "org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat" OUTPUTFORMAT > "org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat";{noformat} > h3. Expected behavior > We expect to create a table successfully in Avro format if the column schema > contains MAP with INTEGER key. We tried other formats like Parquet & ORC and > the outcome is consistent with this expectation. > Here is a simplified example showing expected behavior using the Parquet & > ORC file formats: > {noformat} > spark-sql> create table orc_map(c1 MAP<INT, STRING>) STORED AS ORC; > Time taken: 0.196 seconds > spark-sql> create table parquet_map(c1 MAP<INT, STRING>) STORED AS PARQUET; > Time taken: 0.113 seconds > spark-sql> desc orc_map; > c1 map<int,string> > Time taken: 0.387 seconds, Fetched 1 row(s) > spark-sql> desc parquet_map; > c1 map<int,string> > Time taken: 0.077 seconds, Fetched 1 row(s){noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)