friendlymatthew opened a new issue, #19839: URL: https://github.com/apache/datafusion/issues/19839
### Describe the bug The `ParquetOpener` errors when reading parquet files that lack page index metadata during sparse column chunk reads with row selection masks :neckbeard: This manifests as "Invalid offset in sparse column chunk data: [offset], no matching page found" errors I think the main problem here is using `ArrowReaderOptions::with_page_index(true)` which internally sets `PageIndexPolicy::Required` and strictly requires page index metadata to be present. This API was replaced in arrow land with the more flexible `PageIndexPolicy` enum that expands behavior from 2 boolean states to 3 policy options (Required, Optional, Never) Related issues - https://github.com/apache/arrow-rs/issues/9197 ### To Reproduce _No response_ ### Expected behavior We should set page index policy to `PageIndexPolicy::Optional`. This way it gracefully handles files both with and without page index metadata https://github.com/apache/datafusion/blob/6f92ea6005c24441c2462b2fbe6aaefee3af478d/datafusion/datasource-parquet/src/opener.rs#L434-L435 ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
