[ https://issues.apache.org/jira/browse/HIVE-13632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15263386#comment-15263386 ]
Yongzhi Chen commented on HIVE-13632: ------------------------------------- Each serde is different, for example the avro, avro record has a map: record GenericData$Record (id=8454) {"key": "abcd", "arrayvalues": [], "mapvalues": {}} Each column name map to a value object. For empty array, it is an empty List object. The Serializing value is just convert the record directly by encoding: BinaryEncoder be = EncoderFactory.get().directBinaryEncoder((DataOutputStream)out, null); So it is very easy to translate. Our problem is, although hive know it is an empty list before Serializing to Parque, but it does not know how to tell Parque it is an empty. > Hive failing on insert empty array into parquet table > ----------------------------------------------------- > > Key: HIVE-13632 > URL: https://issues.apache.org/jira/browse/HIVE-13632 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Affects Versions: 1.1.0 > Reporter: Yongzhi Chen > Assignee: Yongzhi Chen > Attachments: HIVE-13632.1.patch > > > The insert will fail with following stack: > {noformat} > by: parquet.io.ParquetEncodingException: empty fields are illegal, the field > should be ommited completely instead > at > parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endField(MessageColumnIO.java:271) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$ListDataWriter.write(DataWritableWriter.java:271) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:199) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:215) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:88) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31) > at > parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:116) > at > parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123) > at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42) > at > org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:111) > at > org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:124) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:697) > {noformat} > Reproduce: > {noformat} > create table test_small ( > key string, > arrayValues array<string>) > stored as parquet; > insert into table test_small select 'abcd', array() from src limit 1; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)