[ https://issues.apache.org/jira/browse/HIVE-26271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sindhu Subhas updated HIVE-26271: --------------------------------- Description: Steps to replicate: # Create a parquet file using ADF sink from any source. with a decimal column. # Move the file to hive external table's ABFS location. # Create external table on top of the file. # Create ORC table with string column CTAS on parquet external table. Error stack: Caused by: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file wasb://vvv-2022-05-23t07-57-56-3...@xx.blob.core.windows.net/hive/xx/part-00000-c94f8032-a16b-4314-8868-9fc63a47422e-c000.snappy.parquet at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:422) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203) ... 27 more Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file wasb://vvv-2022-05-23t07-57-56-3...@xx.blob.core.windows.net/hive/xx/part-00000-c94f8032-a16b-4314-8868-9fc63a47422e-c000.snappy.parquet at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251) at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:98) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:60) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:93) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:419) ... 28 more Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo cannot be cast to org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$8$7.convert(ETypeConverter.java:587) at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$8$7.convert(ETypeConverter.java:583) at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$BinaryConverter.addBinary(ETypeConverter.java:792) at org.apache.parquet.column.impl.ColumnReaderImpl$2$6.writeValue(ColumnReaderImpl.java:317) at org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:367) at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406) at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226) ... 33 more ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1653293742194_0008_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2) The issue is with the new Parquet reader and Hive decimal support in the new reader doesn't properly implement the Parquet – Hive only handles the {{fixed_len_byte_array}} case in this spec. Some work needs to be done to add support for the rest. was: Steps to replicate: # Create a parquet file using ADF sink from any source. with a decimal column. # Move the file to hive external table's ABFS location. # Create external table on top of the file. # Create ORC table with string column CTAS on parquet external table. Error stack: Caused by: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file wasb://vvv-2022-05-23t07-57-56-3...@xx.blob.core.windows.net/hive/xx/part-00000-c94f8032-a16b-4314-8868-9fc63a47422e-c000.snappy.parquet at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:422) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203) ... 27 more Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file wasb://vvv-2022-05-23t07-57-56-3...@xx.blob.core.windows.net/hive/xx/part-00000-c94f8032-a16b-4314-8868-9fc63a47422e-c000.snappy.parquet at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251) at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:98) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:60) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:93) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:419) ... 28 more Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo cannot be cast to org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$8$7.convert(ETypeConverter.java:587) at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$8$7.convert(ETypeConverter.java:583) at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$BinaryConverter.addBinary(ETypeConverter.java:792) at org.apache.parquet.column.impl.ColumnReaderImpl$2$6.writeValue(ColumnReaderImpl.java:317) at org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:367) at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406) at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226) ... 33 more ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1653293742194_0008_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2) The issue is with the new Parquet reader and Hive decimal support in the new reader doesn't properly implement the Parquet [spec|https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md] -- Hive only handles the {{fixed_len_byte_array}} case in this spec. Some work needs to be done to add support for the rest. > ParquetDecodingException: Can not read value at 1 in block 0 when reading > Parquet file generated from ADF sink from Hive > ------------------------------------------------------------------------------------------------------------------------ > > Key: HIVE-26271 > URL: https://issues.apache.org/jira/browse/HIVE-26271 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 3.1.1 > Environment: ADF pipeline to create parquet table. > HDInsight 4.1 > Reporter: Sindhu Subhas > Priority: Major > > Steps to replicate: > # Create a parquet file using ADF sink from any source. with a decimal > column. > # Move the file to hive external table's ABFS location. > # Create external table on top of the file. > # Create ORC table with string column CTAS on parquet external table. > > Error stack: > Caused by: java.io.IOException: > org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in > block 0 in file > wasb://vvv-2022-05-23t07-57-56-3...@xx.blob.core.windows.net/hive/xx/part-00000-c94f8032-a16b-4314-8868-9fc63a47422e-c000.snappy.parquet > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:422) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203) > ... 27 more > Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value > at 1 in block 0 in file > wasb://vvv-2022-05-23t07-57-56-3...@xx.blob.core.windows.net/hive/xx/part-00000-c94f8032-a16b-4314-8868-9fc63a47422e-c000.snappy.parquet > at > org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251) > at > org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207) > at > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:98) > at > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:60) > at > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:93) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:419) > ... 28 more > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo cannot be cast to > org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo > at > org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$8$7.convert(ETypeConverter.java:587) > at > org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$8$7.convert(ETypeConverter.java:583) > at > org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$BinaryConverter.addBinary(ETypeConverter.java:792) > at > org.apache.parquet.column.impl.ColumnReaderImpl$2$6.writeValue(ColumnReaderImpl.java:317) > at > org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:367) > at > org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406) > at > org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226) > ... 33 more > ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 > killedTasks:0, Vertex vertex_1653293742194_0008_1_00 [Map 1] killed/failed > due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. > failedVertices:1 killedVertices:0 (state=08S01,code=2) > The issue is with the new Parquet reader and Hive decimal support in the new > reader doesn't properly implement the Parquet – Hive only handles the > {{fixed_len_byte_array}} case in this spec. Some work needs to be done to add > support for the rest. -- This message was sent by Atlassian Jira (v8.20.7#820007)