SourabhBadhya commented on issue #3043:
URL: https://github.com/apache/parquet-java/issues/3043#issuecomment-2869377055
The following issue is still reproducible in the current release version on
the same example file. I tried investigating the issue - Here are some of my
observations.
The pages are written DataPageV1 format, which seems to be precursor version
to DataPageV2.
The failure seems to happen when the pages for the first column is read -
Column -
`{"PathInSchema":["Rowname"],"Type":"BYTE_ARRAY","Encodings":["PLAIN"],"CompressedSize":448,"UncompressedSize":528,"NumValues":32,"CompressionCodec":"SNAPPY"}`
While reading through the pages, it creates `repetitionLevelColumn` and
`definitionLevelColumn` as `ValuesReaderIntIterator`. However, when it actually
tries to get the next integer for the iteration using
`ValuesReaderIntIterator`, it tries to call the
`BinaryPlainValuesReader#readInteger()` which is not defined for
BinaryPlainValuesReader, which is why its failing. Currently its only defined
for `IntegerPlainValuesReader`.
This is just one instance wherein its failing. I am assuming similar error
behaviour for other datatypes apart from Integer for the Plain encoding.
@wgtmac I am wondering how should we calculate the nextInt() for the
different datatypes, since there does not seem to be a way to that right now.
Should we maintain a variable to keep track of the index when it tries to
read/skip from ByteBufferInputStream and then provide the same index for the
`ValuesReaderIntIterator`.
I am also trying to understand what does `repetitionLevelColumn` and
`definitionLevelColumn` exactly mean and how are they used for read / skip the
incoming data?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]