Hello Iceberg devs, I’d like to reopen the discussion on https://github.com/apache/iceberg/pull/14281 (“Core: Group binpack fileGroup by output partitionSpec”) that was marked as stable last week.
This patch introduces an enhancement to the rewrite_data_files action: instead of grouping files by the current table partition spec, it groups them by the output partition spec provided in the rewrite parameters. This behavior enables more efficient bin-packing of small files when rolling data up into a coarser or alternate partition layout. the current concern for the implementation is the introduce of the the new api default S toSourceTypeValue(Type sourceType, T transformedValue) which is used to normalize the partition value back to the source type. for example an hour transform value of `489118` to a timestamp `2025-10-18 22:00:00` so that a different partition transform (e.g. day transform) can apply to it. what's your opinion on whether this is the right abstraction or any alternative? @pvary please share your thoughts as our discussion over slack. appreciated. thanks. regards, Dyno -- reality, with all its ambiguities, does the job just fine.
