[ https://issues.apache.org/jira/browse/FLINK-21397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Flink Jira Bot updated FLINK-21397: ----------------------------------- Labels: stale-critical (was: ) > BufferUnderflowException when read parquet > ------------------------------------------- > > Key: FLINK-21397 > URL: https://issues.apache.org/jira/browse/FLINK-21397 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) > Affects Versions: 1.12.1 > Reporter: lihe ma > Priority: Critical > Labels: stale-critical > Attachments: > part-f33924c5-99c3-4177-9a9a-e2d5c71a799a-1-2324.snappy.parquet > > > error when read parquet file . > when the encoding of all pages in parquet file is PLAIN_DICTIONARY , it works > well , but if parquet file contains 3 pages, and the encoding of page0 and > page1 is PLAIN_DICTIONARY, page2 is PLAIN , then flink throw exception after > page0 and page1 read finish. > the souurce parquet file is write by flink 1.11. > > the parquet file info : > {{row group 0}} > {{--------------------------------------------------------------------------------}} > {{oid: BINARY SNAPPY DO:0 FPO:4 SZ:625876/1748820/2.79 VC:95192 ENC:BIT > [more]...}}{{oid TV=95192 RL=0 DL=1 DS: 36972 DE:PLAIN_DICTIONARY}} > {{ > ----------------------------------------------------------------------------}} > {{ page 0: DLE:RLE RLE:BIT_PACKED VLE:PLAIN_DICTIONARY [more]... SZ:70314}} > {{ page 1: DLE:RLE RLE:BIT_PACKED VLE:PLAIN_DICTIONARY [more]... SZ:74850}} > {{ page 2: DLE:RLE RLE:BIT_PACKED VLE:PLAIN ST:[m [more]... SZ:568184 }} > {{BINARY oid}} > exception msg: > {code:java} > Caused by: java.nio.BufferUnderflowExceptionCaused by: > java.nio.BufferUnderflowException at > java.nio.HeapByteBuffer.get(HeapByteBuffer.java:151) at > java.nio.ByteBuffer.get(ByteBuffer.java:715) at > org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.getBytes(Binary.java:422) > at > org.apache.flink.formats.parquet.vector.reader.BytesColumnReader.readBatchFromDictionaryIds(BytesColumnReader.java:77) > at > org.apache.flink.formats.parquet.vector.reader.BytesColumnReader.readBatchFromDictionaryIds(BytesColumnReader.java:31) > at > org.apache.flink.formats.parquet.vector.reader.AbstractColumnReader.readToVector(AbstractColumnReader.java:186) > at > org.apache.flink.formats.parquet.ParquetVectorizedInputFormat$ParquetReader.nextBatch(ParquetVectorizedInputFormat.java:363) > at > org.apache.flink.formats.parquet.ParquetVectorizedInputFormat$ParquetReader.readBatch(ParquetVectorizedInputFormat.java:334) > at > org.apache.flink.connector.file.src.impl.FileSourceSplitReader.fetch(FileSourceSplitReader.java:71) > at > org.apache.flink.connector.base.source.reader.fetcher.FetchTask.run(FetchTask.java:56) > at > org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:138) > ... 6 more > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)