[ 
https://issues.apache.org/jira/browse/FLINK-23402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Walther closed FLINK-23402.
--------------------------------
    Release Note: The default DataStream API shuffle mode for batch executions 
has been changed to blocking exchanges for all edges of the stream graph. A new 
option `execution.shuffle-mode` allows to change it to pipelined behavior if 
necessary.
      Resolution: Fixed

Fixed in 1.14.0:

commit a78f34a735c4619cfef882f9b9a2057c507a4bca
[streaming-java][table-planner] Add ShuffleMode option

commit 0139222030d5e3dac2b9ffe7200c758ab6153fff
[streaming-java] Default to GlobalStreamExchangeMode.ALL_EDGES_BLOCKING in 
batch mode

commit 313718466d15b473bd5bf1dcf0d9d988e0fd5979
[streaming-java] Mark GlobalStreamExchangeMode as @Internal

commit 156f517d387202ac292bde5bfac423a23908b7a2
[streaming-java] Refactor GlobalDataExchangeMode to GlobalStreamExchangeMode

commit 86f54c89c7866647e50d3957026bd0d28869ea8d
[streaming-java] Fix minor code issues around 'shuffle mode'

commit 4e65322dc1b5f80a7f3a42f0f205f978357daa40
[streaming-java] Refactor ShuffleMode to StreamExchangeMode

> Expose a consistent GlobalDataExchangeMode
> ------------------------------------------
>
>                 Key: FLINK-23402
>                 URL: https://issues.apache.org/jira/browse/FLINK-23402
>             Project: Flink
>          Issue Type: Sub-task
>          Components: API / DataStream
>            Reporter: Timo Walther
>            Assignee: Timo Walther
>            Priority: Major
>              Labels: pull-request-available
>
> The Table API makes the {{GlobalDataExchangeMode}} configurable via 
> {{table.exec.shuffle-mode}}.
> In Table API batch mode the StreamGraph is configured with 
> {{ALL_EDGES_BLOCKING}} and in DataStream API batch mode 
> {{FORWARD_EDGES_PIPELINED}}.
> I would vote for unifying the exchange mode of both APIs so that complex SQL 
> pipelines behave identical in {{StreamTableEnvironment}} and 
> {{TableEnvironment}}. Also the feedback a got so far would make 
> {{ALL_EDGES_BLOCKING}} a safer option to run pipelines successfully with 
> limited resources.
> [~lzljs3620320]
> {quote}
> The previous history was like this:
> - The default value is pipeline, and we find that many times due to 
> insufficient resources, the deployment will hang. And the typical use of 
> batch jobs is small resources running large parallelisms, because in batch 
> jobs, the granularity of failover is related to the amount of data processed 
> by a single task. The smaller the amount of data, the faster the fault 
> tolerance. So most of the scenarios are run with small resources and large 
> parallelisms, little by little slowly running.
> - Later, we switched the default value to blocking. We found that the better 
> blocking shuffle implementation would not slow down the running speed much. 
> We tested tpc-ds and it took almost the same time.
> {quote}
> [~dwysakowicz]
> {quote}
> I don't see a problem with changing the default value for DataStream batch 
> mode if you think ALL_EDGES_BLOCKING is the better default option.
> {quote}
> In any case, we should make this configurable for DataStream API users and 
> make the specific Table API option obsolete.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to