[
https://issues.apache.org/jira/browse/BEAM-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739492#comment-16739492
]
Fabiano Francesconi commented on BEAM-68:
-----------------------------------------
I found this ticket after looking to do the same. Especially point number 1.
mentioned by [~mingmxu] causes another side-effect (on Flink): measuring back
pressure. If parallelism is higher than the number of partitions, then some of
the tasks are terminated immediately and this confuses Flink.
Is there any chance that this ticket gets picked up in the short-term?
> Support for limiting parallelism of a step
> ------------------------------------------
>
> Key: BEAM-68
> URL: https://issues.apache.org/jira/browse/BEAM-68
> Project: Beam
> Issue Type: New Feature
> Components: beam-model
> Reporter: Daniel Halperin
> Priority: Major
>
> Users may want to limit the parallelism of a step. Two classic uses cases are:
> - User wants to produce at most k files, so sets
> TextIO.Write.withNumShards(k).
> - External API only supports k QPS, so user sets a limit of k/(expected
> QPS/step) on the ParDo that makes the API call.
> Unfortunately, there is no way to do this effectively within the Beam model.
> A GroupByKey with exactly k keys will guarantee that only k elements are
> produced, but runners are free to break fusion in ways that each element may
> be processed in parallel later.
> To implement this functionaltiy, I believe we need to add this support to the
> Beam Model.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)