adriangb opened a new pull request, #19595: URL: https://github.com/apache/datafusion/pull/19595
## Which issue does this PR close? Part of #19433 ## Rationale for this change When writing data to a table created with `CREATE EXTERNAL TABLE ... WITH ORDER`, the sorting columns should be recorded in the Parquet file's row group metadata. This allows downstream readers to know the data is sorted and potentially skip sorting operations. ## What changes are included in this PR? - Add `sort_expr_to_sorting_column()` and `lex_ordering_to_sorting_columns()` functions in `metadata.rs` to convert DataFusion ordering to Parquet `SortingColumn` - Add `sorting_columns` field to `ParquetSink` with `with_sorting_columns()` builder method - Update `create_writer_physical_plan()` to pass order requirements to `ParquetSink` - Update `create_writer_props()` to set sorting columns on `WriterProperties` - Add test verifying `sorting_columns` metadata is written correctly ## Are these changes tested? Yes, added `test_create_table_with_order_writes_sorting_columns` that: 1. Creates an external table with `WITH ORDER (a ASC NULLS FIRST, b DESC NULLS LAST)` 2. Inserts data 3. Reads the Parquet file and verifies the `sorting_columns` metadata matches the expected order ## Are there any user-facing changes? No user-facing API changes. Parquet files written via `INSERT INTO` or `COPY` for tables with `WITH ORDER` will now contain `sorting_columns` metadata in the row group. 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
