Could you try setting ”execution.batch-shuffle-mode‘=‘ALL_EXCHANGES_PIPELINED’? Looks like the ExecutionMode in ExecutionConfig does not work for DataStream APIs.
The default shuffling behavior for a DataStream API in batch mode is 'ALL_EXCHANGES_BLOCKING' where upstream and downstream tasks run subsequently. On the other hand, the pipelined mode will have upstream and downstream tasks run simultaneously. Best, Zhanghao Chen ________________________________ From: Hailu, Andreas <andreas.ha...@gs.com> Sent: Wednesday, September 14, 2022 21:37 To: zhanghao.c...@outlook.com <zhanghao.c...@outlook.com>; user@flink.apache.org <user@flink.apache.org> Subject: RE: ExecutionMode in ExecutionConfig Hi Zhanghao, That seems different than what I’m referencing and one of my points of confusion – the documents refer to ExecutionMode as BATCH and STREAMING which is different than what the code refers to it as Runtime Mode e.g. env.setRuntimeMode(RuntimeExecutionMode.BATCH); I’m referring to the ExecutionMode in the ExecutionConfig e.g. env.getConfig().setExecutionMode(ExecutionMode.BATCH)/ env.getConfig().setExecutionMode(ExecutionMode.PIPELINED). I’m not able to find documentation on this anywhere. ah From: zhanghao.c...@outlook.com <zhanghao.c...@outlook.com> Sent: Wednesday, September 14, 2022 1:10 AM To: Hailu, Andreas [Engineering] <andreas.ha...@ny.email.gs.com>; user@flink.apache.org Subject: Re: ExecutionMode in ExecutionConfig https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/datastream/execution_mode/<https://urldefense.proofpoint.com/v2/url?u=https-3A__nightlies.apache.org_flink_flink-2Ddocs-2Drelease-2D1.13_docs_dev_datastream_execution-5Fmode_&d=DwMF-g&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=q-f1lFtNrjN2BnGqtchdhZkFNvCDUE8ZuhD4M0wJsdHcpLqEqTybqUaMAlo6lz91&s=bM_ucnQfxGo5Ky9Fq6S1yXbTqz476hGaKtkZINW4kGU&e=> gives a comprehensive description on it Execution Mode (Batch/Streaming) | Apache Flink<https://urldefense.proofpoint.com/v2/url?u=https-3A__nightlies.apache.org_flink_flink-2Ddocs-2Drelease-2D1.13_docs_dev_datastream_execution-5Fmode_&d=DwMF-g&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=q-f1lFtNrjN2BnGqtchdhZkFNvCDUE8ZuhD4M0wJsdHcpLqEqTybqUaMAlo6lz91&s=bM_ucnQfxGo5Ky9Fq6S1yXbTqz476hGaKtkZINW4kGU&e=> Execution Mode (Batch/Streaming) # The DataStream API supports different runtime execution modes from which you can choose depending on the requirements of your use case and the characteristics of your job. There is the “classic” execution behavior of the DataStream API, which we call STREAMING execution mode. This should be used for unbounded jobs that require continuous incremental ... nightlies.apache.org Best, Zhanghao Chen ________________________________ From: Hailu, Andreas <andreas.ha...@gs.com<mailto:andreas.ha...@gs.com>> Sent: Wednesday, September 14, 2022 7:13 To: user@flink.apache.org<mailto:user@flink.apache.org> <user@flink.apache.org<mailto:user@flink.apache.org>> Subject: ExecutionMode in ExecutionConfig Hello, Is there somewhere I can learn more about the details of the effect of ExecutionMode in ExecutionConfig on a job? I am trying sort out some of the details as it seems to work differently between the DataStream API and deprecated DataSet API. I’ve attached a picture of this job graph - I’m reading from a total of 3 data sources – the results of 2 are sent to CoGroup (orange rectangle), and the other has its records forwarded to a sink after some basic filter + map operations (red rectangle). The DataSet API’s job graph has all of the operators RUNNING immediately as we desire. However, the DataStream API’s job graph only has the DataSource operators that are feeding into the CoGroup online, and the remaining operators wake up only when the 2 sources have completed. This winds up introducing a lot of latency in processing the batch. Both of these are running in the same environment on the same data with identical ExecutionMode configs, just different APIs. I’m attempting to have the same behavior between them. I ask about ExecutionMode as I am able to replicate this behavior in DataSet by setting the ExecutionMode from the default of PIPELINED to BATCH. Thanks! best, ah ________________________________ Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices<http://www.gs.com/privacy-notices> ________________________________ Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>