It doesn't seem worth it to me, since this is a weird use of table properties -- you're not setting a property. I understand that it's a bit easier since Iceberg does the work of making the partitioning and sort order work together. But I think I prefer the solution to use standard SQL clauses rather than passing something unsupported by Spark through table properties, even though it is a bit more difficult for the caller.
On Sat, Mar 14, 2026 at 5:33 AM hemanth boyina <[email protected]> wrote: > Thank Ryan, > > Order By in select can work for the physical sort especially if > partition columns are included first so the sort survives the default Hash > distribution. > > The main difference i see is that with sort order metadata, the write > path coordinates everything automatically- Range distribution and within > partition ordering. Whereas with Order By, users need to manually ensure > the column order aligns with the partition spec to avoid hash distribution > breaking the sort. > > Also, the bigger ask is about atomicity and tooling. Order By + a follow > up alter table ordered by works as a two step process. > > Would it make sense to support this via TBL PROPERTIES or a write ordered > by clause in CTAS ? > > Thanks > Hemanth Boyina > > On Sat, 14 Mar 2026 at 2:27 AM, Ryan Blue <[email protected]> wrote: > >> Why not include an ORDER BY in your SELECT query? If you want to specify >> the sort order in RTAS, then you don't need to be setting the table >> properties. Just pass it to the SELECT instead. >> >> On Fri, Mar 13, 2026 at 11:10 AM hemanth boyina < >> [email protected]> wrote: >> >>> Hi Everyone, >>> >>> I'd like to revisit the feature request for specifying sort order >>> during >>> CREATE TABLE AS SELECT (CTAS) and REPLACE TABLE AS SELECT (RTAS) >>> operations >>> in Spark SQL. This has been previously raised in: >>> >>> - https://github.com/apache/iceberg/issues/4889 (Support sort order in >>> DDL) >>> - https://github.com/apache/iceberg/issues/14612 >>> >>> THE PROBLEM >>> ----------- >>> >>> Currently, the only way to set a sort order is through a separate >>> ALTER TABLE ... WRITE ORDERED BY command after table creation. This >>> creates >>> significant operational challenges for full table refresh workflows: >>> >>> Option A - Empty table window: >>> >>> REPLACE TABLE db.sample AS SELECT * FROM source WHERE 1=0; >>> ALTER TABLE db.sample WRITE ORDERED BY category, id; >>> INSERT INTO db.sample SELECT * FROM source; >>> >>> Problem: Table is temporarily empty between steps 1 and 3, making it >>> unavailable to consumers. >>> >>> Option B - Double write: >>> >>> REPLACE TABLE db.sample AS SELECT * FROM source; >>> CALL system.rewrite_data_files( >>> table => 'db.sample', >>> strategy => 'sort', >>> sort_order => 'category ASC,id ASC', >>> options => map('rewrite-all', 'true') >>> ); >>> >>> Problem: Data is written twice - once unsorted, then again sorted. >>> Doubles compute cost and storage churn. >>> >>> >>> Implementation Plan : >>> >>> Spark's DSv2 StagingTableCatalog interface doesnt have sort order >>> parameter, So TBLPROPERTIES is the natural channel to pass the information. >>> Changes in Spark Catalog with parseSortOrder, exact pattern from rewrite >>> data files. >>> >>> Please let me know if I am missing something here. >>> >>> Thanks >>> Hemanth Boyina >>> >>
