Hi Dyno,

You’re right that we need a matrix for *bindByTransformedValue*. My concern
with *toSourceTypeValue* is that the conversion is not one-to-one, and
exposing an API with such loosely defined return semantics is generally not
a good design choice.

Given that this is a low-level API change, I would prefer to involve more
people in the discussion before moving forward.

Thanks,
Peter

Dyno Fu <[email protected]> ezt írta (időpont: 2026. jan. 27., K, 4:08):

> thanks Peter,
>   For `bindByTransformedValue`, from the signature, it would need to
> encode a conversion matrix between different transformations. we can
> also use `toSourceTypeValue` to implement `bindByTransformedValue` if you
> think `bindByTransformedValue` as an api is semantically more meaningful.
> -Dyno
>
> On Tue, Dec 16, 2025 at 2:57 AM Péter Váry <[email protected]>
> wrote:
>
>> Hi Team,
>>
>> Thanks Dyno for bringing this up on the dev list!
>>
>> For the others, the original goal is that if we have two transformations
>> where *T1.satisfiesOrderOf(T2)*, then given a partition value P1 for T1,
>> we should be able to derive the corresponding partition value P2 for T2
>> (for example, the day 2025-10-18 exactly determines the month 2025-10). One
>> possible approach is the API Dyno proposed, which would be part of the
>> Transform interface. I’ve included your suggested Javadoc at the end of
>> this message for reference.
>>
>> The alternative we discussed was something like:
>>
>> *<P> SerializableFunction<S, T> bindByTransformedValue(Transform<?, P>
>> otherTransform, P otherOutput)*
>>
>>
>> This is a very low-level API, and I’d prefer to extend it only if no
>> better alternative exists. If you have other ideas or suggestions, we’d be
>> happy to hear them.
>>
>> Thanks,
>> Peter
>>
>> The javadoc for the API proposed by Dyno:
>>
>>
>> *  /***
>> *   * Converts a transformed partition value back to a representative
>> source type value.*
>> *   **
>> *   * <p>This method returns a source value that would produce the given
>> transformed value when this*
>> *   * transform is applied. For temporal transforms, this returns the
>> start of the period (e.g.,*
>> *   * start of hour, day, month, or year). For truncate transforms, this
>> returns the truncated value*
>> *   * as-is since it preserves the source type.*
>> *   **
>> *   * <p>This is useful for chaining transforms when {@link
>> #satisfiesOrderOf(Transform)} is true,*
>> *   * allowing conversion from a finer granularity to a coarser one by
>> converting back to source type*
>> *   * and reapplying the coarser transform.*
>> *   **
>> *   * @param sourceType the source type for this transform*
>> *   * @param transformedValue the transformed partition value*
>> *   * @return a source value that would produce this transformed value,
>> or null if the input is null*
>> *   * @throws UnsupportedOperationException if this transform does not
>> support conversion back to*
>> *   *     source type*
>> *   */*
>> default S toSourceTypeValue(Type sourceType, T transformedValue) {
>>
>>
>>
>> Dyno Fu <[email protected]> ezt írta (időpont: 2025. dec. 15., H, 20:53):
>>
>>> Hello Iceberg devs,
>>>
>>> I’d like to reopen the discussion on
>>> https://github.com/apache/iceberg/pull/14281 (“Core: Group binpack
>>> fileGroup by output partitionSpec”) that was marked as stable last week.
>>>
>>> This patch introduces an enhancement to the rewrite_data_files action:
>>> instead of grouping files by the current table partition spec, it groups
>>> them by the output partition spec provided in the rewrite parameters. This
>>> behavior enables more efficient bin-packing of small files when rolling
>>> data up into a coarser or alternate partition layout.
>>>
>>> the current concern for the implementation is the introduce of the the
>>> new api
>>>
>>> default S toSourceTypeValue(Type sourceType, T transformedValue)
>>>
>>> which is used to normalize the partition value back to the source type.
>>> for example an hour transform value of `489118` to a timestamp `2025-10-18
>>> 22:00:00` so that a different partition transform (e.g. day transform) can
>>> apply to it.
>>>
>>> what's your opinion on whether this is the right abstraction or any
>>> alternative?
>>> @pvary please share your thoughts as our discussion over slack.
>>> appreciated. thanks.
>>>
>>> regards,
>>> Dyno
>>>
>>> --
>>> reality, with all its ambiguities, does the job just fine.
>>>
>>
>
> --
> reality, with all its ambiguities, does the job just fine.
>

Reply via email to