EMERSON WANG created FLINK-35521:
------------------------------------
Summary: Flink FileSystem SQL Connector Generating SUCESS File
Multiple Times
Key: FLINK-35521
URL: https://issues.apache.org/jira/browse/FLINK-35521
Project: Flink
Issue Type: Improvement
Components: Connectors / FileSystem
Affects Versions: 1.18.1
Environment: Our PyFlink SQL jobs are running in AWS EKS environment.
Reporter: EMERSON WANG
Our Flink table SQL job received data from the Kafka streams and then sinked
all partitioned data into the associated parquet files under the same S3 folder
through the filesystem SQL connector.
For the S3 filesystem SQL connector, sink.partition-commit.policy.kind was set
to 'success-file' and sink.partition-commit.trigger was set
to 'partition-time'. We found that _SUCCESS file in the S3 folder was generated
multiple times after multiple partitions are committed.
Because all partitioned parquet files and _SUCCESS file are in the same S3
folder and _SUCCESS file is used to trigger the downstream application, we
really like the _SUCCESS file to be generated only once instead of multiple
times after all partitions are committed and all parquet files are ready to be
processed.
Thus, one _SUCCESS file can be used to trigger the downstream application only
once instead of multiple times.
We knew we could set sink.partition-commit.trigger to 'process-time' to
generate _SUCCESS file only once in the S3 folder; however, 'process-time'
would not meet our business requirements.
We'd request the FileSystem SQL connector should support to the following new
user case:
Even if sink.partition-commit.trigger is set to 'partition-time', _SUCCESS file
will be generated only once after all partitions are committed and all output
files are ready to be processed, and will be used to trigger the downstream
application only once instead of multiple times.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)