SourabhBadhya commented on issue #3043:
URL: https://github.com/apache/parquet-java/issues/3043#issuecomment-2869377055

   The following issue is still reproducible in the current release version on 
the same example file. I tried investigating the issue - Here are some of my 
observations.
   The pages are written DataPageV1 format, which seems to be precursor version 
to DataPageV2. 
   The failure seems to happen when the pages for the first column is read - 
   Column - 
`{"PathInSchema":["Rowname"],"Type":"BYTE_ARRAY","Encodings":["PLAIN"],"CompressedSize":448,"UncompressedSize":528,"NumValues":32,"CompressionCodec":"SNAPPY"}`
   While reading through the pages, it creates `repetitionLevelColumn` and 
`definitionLevelColumn` as `ValuesReaderIntIterator`. However, when it actually 
tries to get the next integer for the iteration using 
`ValuesReaderIntIterator`, it tries to call the 
`BinaryPlainValuesReader#readInteger()` which is not defined for 
BinaryPlainValuesReader, which is why its failing. Currently its only defined 
for `IntegerPlainValuesReader`.
   
   This is just one instance wherein its failing. I am assuming similar error 
behaviour for other datatypes apart from Integer for the Plain encoding.
   
   @wgtmac I am wondering how should we calculate the nextInt() for the 
different datatypes, since there does not seem to be a way to that right now. 
Should we maintain a variable to keep track of the index when it tries to 
read/skip from ByteBufferInputStream and then provide the same index for the 
`ValuesReaderIntIterator`.
   
   I am also trying to understand what does `repetitionLevelColumn` and 
`definitionLevelColumn` exactly mean and how are they used for read / skip the 
incoming data?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to