Re: Support for controlling slot assignment based on CPU requirements

Xintong Song Wed, 12 Jun 2019 19:03:56 -0700

Hi Ken,

There is a discussion in issue
<https://issues.apache.org/jira/browse/FLINK-12122> about a feature related
to your demand. It proposes spread tasks evenly across TMs. However, the
feature is still in progress, and it spreads all tasks evenly instead of
specific operators.


For the time being, I would suggest to have only one slot per TM, and use slot
sharing group
<https://ci.apache.org/projects/flink/flink-docs-release-1.8/concepts/runtime.html#task-slots-and-resources>
to
make sure tasks of the same job graph vertex do not goes into the same
slot/TM.

Thank you~

Xintong Song



On Thu, Jun 13, 2019 at 4:58 AM Ken Krugler <kkrugler_li...@transpac.com>
wrote:

> Hi all,
>
> I’m running a complex (batch) workflow that has a step where it trains
> Fasttext models.
>
> This is very CPU-intensive, to the point where it will use all available
> processing power on a server.
>
> The Flink configuration I’m using is one TaskManager per server, with N
> slots == available cores.
>
> So what I’d like to do is ensure that if I have N of these training
> operators running in parallel on N TaskManagers, slot assignment happens
> such that each TM has one such operator.
>
> Unfortunately, what typically happens now is that most/all of these
> operators get assigned to the same TM, which then struggles to stay alive
> under that load.
>
> I haven’t seen any solution to this, though I can imagine some helicopter
> stunts that could work around the issue.
>
> Any suggestions?
>
> Thanks,
>
> — Ken
>
> PS - I took a look through the list of FLIPs <
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals>,
> and didn’t see anything that covered this. I image it would need to be
> something like YARN’s support for per-node vCore capacity and per-task
> vCore requirements, but on a per-TM/per-operator basis.
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> Custom big data solutions & training
> Flink, Solr, Hadoop, Cascading & Cassandra
>
>

Re: Support for controlling slot assignment based on CPU requirements

Reply via email to