Hi, Lu. 

Thank you for your proposal. I believe it is very helpful for joins to improve 
performance. After reading the Flip, I have the following questions and look 
forward to your reply.

1. I would like to confirm what happens when an Iceberg table is not a 
partitioned table but still supports SupportsPartitioning. Will 
SupportsPartitioning#outputPartitioning return empty?

2. In KeyGroupedPartitioning, could you explain why partition values need to be 
ordered?

3. Can you outline the relationship between the parallelism of join and the 
data sizes and number of partitions of the left and right tables?

4. Nit: Regarding the Flink flip, it is not necessary to specifically list out 
the changes that Iceberg needs to adapt to. If needed, these can be added as an 
appendix at the end.

5. In the "Partition Mismatch Problem" section, how do we inform table B that 
it needs to create an empty partition?

6. In the "Join Keys and Partition Keys Relationship" section, I noticed that 
when the join keys are a superset of the partition keys, c1 is still used as 
the partition key. Do you mean we will treat “A.c2 = B.c2” as a normal 
condition and apply this filter after the join?

7. Considering the "Correlated Partition Number", could we have something 
similar to `resolveCommonPartitioning` that allows the source to assist in 
merging two partitionings when their partition numbers are inconsistent or 
their partition values differ?

8. In the "Compatibility, Deprecation, and Migration Plan" section, I noticed 
you mentioned that “The feature will be opt-in through configuration, ensuring 
no impact on existing workloads.” However, I did not find the relevant 
configuration in the public interface—could you please elaborate on this?

9. Is it possible to eliminate the exchange between the source and join using 
FlinkExpandConversionRule? This seems more general and could benefit other 
operations like rank, group agg etc. as well.




--

    Best!
    Xuyang





At 2025-06-26 06:36:38, "Lu Niu" <qqib...@gmail.com> wrote:
>Hi devs,
>
>I’d like to start a discussion on FLIP-535: Support Storage Partition Join
>in Flink:
>
>https://docs.google.com/document/d/1bX37wpwivO3fS4WrR9tZ0pDF92Mpux6pZE54Tw8czzU/edit?tab=t.0#heading=h.goqvqbz0cyel
>
>
>The goal is to support storage partition join in Flink batch mode for
>iceberg tables to avoid unnecessary shuffles.
>
>It would be great if someone could help to copy the contents to a FLIP
>page. I don’t have the permission.
>
>Looking forward to your feedback.
>
>Best
>
>Lu

Reply via email to