Link for Paimon LocalMerge Operator[1]

[1]
https://paimon.apache.org/docs/master/maintenance/write-performance/#local-merging

xiangyu feng <xiangyu...@gmail.com> 于2025年2月11日周二 14:03写道:

> Follow the above,
>
> "And for SinkWriter, the data structure to be processed should be fixed."
>
> I'm not very sure why the data structure of SinkWriter should be fixed.
> Can you elaborate the scenario here?
>
>  "Is there a node or an operator to fill in the inconsistent field of
> Rowdata that passed from different Sources?"
>
> By `filling in the inconsistent field from different sources`, do you
> refer to implementations like the LocalMerge Operator [1] for Paimon? IMHO,
> this should not be included in the Sink Reuse. The merging behavior of
> multiple sources should be considered inside of the sink.
>
> Regards,
> Xiangyu Feng
>
> xiangyu feng <xiangyu...@gmail.com> 于2025年2月11日周二 13:46写道:
>
>> Hi Yanquan,
>>
>> Thx for reply. IIUC, the schema of CatalogTable should contain all target
>> columns for sources. If not, a SQL validation exception should be raised
>> for planner.
>>
>> Regards,
>> Xiangyu Feng
>>
>>
>>
>> Yanquan Lv <decq12y...@gmail.com> 于2025年2月10日周一 16:25写道:
>>
>>> Hi, Xiangyu. Thanks for driving this.
>>>
>>> I have a question to confirm:
>>> Considering the case that different Sources use different columns[1],
>>> will the Schema of CatalogTable[2] contain all target columns for Sources?
>>> And for SinkWriter, the data structure to be processed should be fixed.
>>> Is there a node or an operator to fill in the inconsistent field of Rowdata
>>> that passed from different Sources?
>>>
>>> [1]
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner
>>> [2]
>>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sourcessinks/#planning
>>>
>>>
>>>
>>> > 2025年2月6日 17:06,xiangyu feng <xiangyu...@gmail.com> 写道:
>>> >
>>> > Hi devs,
>>> >
>>> > I'm opening this thread to discuss FLIP-506: Support Reuse Multiple
>>> Table
>>> > Sinks in Planner[1].
>>> >
>>> > Currently if users want to partial-update a downstream table from
>>> multiple
>>> > source tables in one datastream, they would have to manually union all
>>> > source tables and add lots of "cast(null as string) as xxx" in Flink
>>> SQL.
>>> > This will make the SQL here hard to use and maintain.
>>> >
>>> > After discussing with Weijie Guo, we think that by supporting reuse
>>> sink
>>> > nodes in planner, the usability can be greatly improved in this case.
>>> >
>>> > Therefore, we propose to add a new option
>>> > *`table.optimizer.reuse-sink-enabled`* here to support this feature.
>>> More
>>> > details can be found in the FLIP.
>>> >
>>> > Looking forward to your feedback, thanks.
>>> >
>>> > [1]
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner
>>> >
>>> > Best regards,
>>> > Xiangyu Feng
>>>
>>>

Reply via email to