Hello Iceberg devs,

I’d like to reopen the discussion on
https://github.com/apache/iceberg/pull/14281 (“Core: Group binpack
fileGroup by output partitionSpec”) that was marked as stable last week.

This patch introduces an enhancement to the rewrite_data_files action:
instead of grouping files by the current table partition spec, it groups
them by the output partition spec provided in the rewrite parameters. This
behavior enables more efficient bin-packing of small files when rolling
data up into a coarser or alternate partition layout.

the current concern for the implementation is the introduce of the the new
api

default S toSourceTypeValue(Type sourceType, T transformedValue)

which is used to normalize the partition value back to the source type. for
example an hour transform value of `489118` to a timestamp `2025-10-18
22:00:00` so that a different partition transform (e.g. day transform) can
apply to it.

what's your opinion on whether this is the right abstraction or any
alternative?
@pvary please share your thoughts as our discussion over slack.
appreciated. thanks.

regards,
Dyno

-- 
reality, with all its ambiguities, does the job just fine.

Reply via email to