GaneshPatil7517 opened a new pull request, #19848:
URL: https://github.com/apache/datafusion/pull/19848

   ## Summary
   Implements Phase 1 infrastructure for reverse page ordering in Parquet sort 
pushdown optimization, addressing issue #19486. This foundation establishes the 
flag infrastructure necessary for future page-level reversal implementation.
   
   ## What's Changed
   - Added `reverse_pages` flag to `ParquetSource` struct with getter/setter 
methods
   - Added `reverse_pages` field to `ParquetOpener` struct via builder pattern
   - Extended `try_reverse_output()` to set both `reverse_row_groups` and 
`reverse_pages` flags when optimizing descending sorts
   - Wired flag propagation through the existing FileSource → ParquetOpener → 
ParquetSource call chain
   - Updated display formatting to show `reverse_pages` when enabled
   
   ## Architecture
   This implementation follows the established pattern of `reverse_row_groups`:
   - Infrastructure flag is added to both source and opener structs
   - Flag is set via builder pattern for clean API design
   - Propagation through `try_reverse_output()` ensures coordination with row 
group reversal
   
   ## Testing
   - ✅ All 27 existing reverse-related tests pass
   - ✅ Added 4 new comprehensive tests for `reverse_pages` functionality:
     - `test_reverse_pages_default_value` - Verifies default is false
     - `test_reverse_pages_with_setter` - Verifies setter works correctly
     - `test_reverse_pages_clone_preserves_value` - Ensures cloning preserves 
state
     - `test_reverse_pages_independent_of_reverse_row_groups` - Confirms 
independent flag operation
   - ✅ No regressions
   - ✅ Code quality verified:
     - `cargo fmt` - properly formatted
     - `cargo clippy -D warnings` - no warnings
   
   ## Phase 1 Design Rationale
   This Phase 1 implementation establishes infrastructure for future page-level 
reversal. Actual page reversal implementation is deferred to Phase 2 because:
   - Arrow-rs `ParquetRecordBatchStreamBuilder` currently lacks public APIs for 
page-level reversal
   - Materializing all pages in memory for reversal would have significant 
performance implications
   - Separating infrastructure (Phase 1) from implementation (Phase 2) enables 
parallel development
   
   Phase 2 can implement actual page reversal once arrow-rs provides necessary 
page-level APIs or alternative approaches are available.
   
   ## Files Modified
   - `datafusion/datasource-parquet/src/source.rs` - Added reverse_pages field 
and methods
   - `datafusion/datasource-parquet/src/opener.rs` - Added reverse_pages field 
to builder
   - Added comprehensive test coverage
   
   ## Related Issues
   Fix #19486


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to