zhangdingxin created FLINK-35237:
------------------------------------
Summary: Allow Custom HashFunction in PrePartitionOperator for
Flink Sink Customization
Key: FLINK-35237
URL: https://issues.apache.org/jira/browse/FLINK-35237
Project: Flink
Issue Type: Improvement
Components: Flink CDC
Reporter: zhangdingxin
The {{PrePartitionOperator}} in its current implementation only supports a
fixed {{HashFunction}}
({{{}org.apache.flink.cdc.runtime.partitioning.PrePartitionOperator.HashFunction{}}}).
This limits the ability of Sink implementations to customize the partitioning
logic for {{{}DataChangeEvent{}}}s. For example, in the case of partitioned
tables, it would be advantageous to allow hashing based on partition keys,
hashing according to table names, or using the database engine's internal
primary key hash functions (such as with MaxCompute DataSink).
When users require such custom partitioning logic, they are compelled to
implement their PartitionOperator, which undermines the utility of
{{{}PrePartitionOperator{}}}.
To address this limitation, it would be highly desirable to enable the
{{PrePartitionOperator}} to support user-specified custom {{{}HashFunction{}}}s
(Function<DataChangeEvent, Integer>). A possible solution could involve a
mechanism analogous to the {{DataSink}} interface, allowing the specification
of a {{HashFunctionFactory}} class path in the configuration file. This
enhancement would greatly facilitate users in tailoring partition strategies to
meet their specific application needs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)