adriangb opened a new pull request, #19595:
URL: https://github.com/apache/datafusion/pull/19595

   ## Which issue does this PR close?
   
   Part of #19433
   
   ## Rationale for this change
   
   When writing data to a table created with `CREATE EXTERNAL TABLE ... WITH 
ORDER`, the sorting columns should be recorded in the Parquet file's row group 
metadata. This allows downstream readers to know the data is sorted and 
potentially skip sorting operations.
   
   ## What changes are included in this PR?
   
   - Add `sort_expr_to_sorting_column()` and 
`lex_ordering_to_sorting_columns()` functions in `metadata.rs` to convert 
DataFusion ordering to Parquet `SortingColumn`
   - Add `sorting_columns` field to `ParquetSink` with `with_sorting_columns()` 
builder method
   - Update `create_writer_physical_plan()` to pass order requirements to 
`ParquetSink`
   - Update `create_writer_props()` to set sorting columns on `WriterProperties`
   - Add test verifying `sorting_columns` metadata is written correctly
   
   ## Are these changes tested?
   
   Yes, added `test_create_table_with_order_writes_sorting_columns` that:
   1. Creates an external table with `WITH ORDER (a ASC NULLS FIRST, b DESC 
NULLS LAST)`
   2. Inserts data
   3. Reads the Parquet file and verifies the `sorting_columns` metadata 
matches the expected order
   
   ## Are there any user-facing changes?
   
   No user-facing API changes. Parquet files written via `INSERT INTO` or 
`COPY` for tables with `WITH ORDER` will now contain `sorting_columns` metadata 
in the row group.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to