Re: [PR] [cdc-cli][cdc-composer] Applying 'flink-config' for pipeline yaml [flink-cdc]

via GitHub Wed, 03 Apr 2024 01:08:57 -0700


PatrickRen commented on code in PR #3187:
URL: https://github.com/apache/flink-cdc/pull/3187#discussion_r1549180625



##########
flink-cdc-cli/src/test/resources/definitions/pipeline-definition-full.yaml:
##########
@@ -53,3 +53,8 @@ pipeline:
   name: source-database-sync-pipe
   parallelism: 4
   enable-schema-evolution: false
+  execution.checkpointing.interval: 10000
+  execution.checkpointing.mode: EXACTLY_ONCE
+  # yarn config
+  yarn.staging-directory: /tmp/flink-cdc
+  yarn.application.queue: flink-cdc

Review Comment:
   Flink 1.19 has a significant change to the format of configuration (see 
[FLIP-366](https://cwiki.apache.org/confluence/display/FLINK/FLIP-366%3A+Support+standard+YAML+for+FLINK+configuration)),
 so actually I have some concerns of this functionality now. If we still use 
the legacy flattened format, there will be a gap between us and the Flink. 
   
   Another thing in my mind is that how to resolve conflicts between pipeline 
configs and Flink configs. There are two level of conflict:
   
   1. Naming conflict: what if a pipeline config has the same key as Flink but 
with totally different meaning
   2. Semantic conflict: `parallelism` in pipeline config v.s. 
`parallelism.default` in Flink config. Which one has higher priority and which 
one to choose for users?
   
   If we mix them together in the `pipeline` section, I'm afraid there might be 
a lot of explanations and conflict resolving works to do in the future.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [cdc-cli][cdc-composer] Applying 'flink-config' for pipeline yaml [flink-cdc]

Reply via email to