Unbalanced job scheduling

AndreaKinn Mon, 16 Oct 2017 09:01:38 -0700

Hi all,
I want to expose you my program flow.

I have the following operators:


kafka-source -> timestamp-extractor -> map -> keyBy -> window -> apply ->
LEARN -> SELECT -> process -> cassandra-sink

the LEARN and SELECT operators belong to an external library supported by
flink. LEARN is a very heavy operation compared to the other operators.

Unfortunately LEARN has a max parallelism of 1, so if I have a cluster of 2
TM with 1 slot each and I set parallelism = 2 I will have one TM which
performs a parallel instances of all the operators and the single instance
of LEARN while the other one TM performs just the second parallel instances
of all the operators (clearly there are no more instance of LEARN).
That's ok and I have no problem with understanding it.

*** The problem:
Actually I have 2 identical flows like this because it matches a situation
where I have two sensor streams so really I have 2 LEARN operators
corresponding to two independent streams.

By the way I noted that even in this case I have one TM which take a load of
the parallel instances of all the operators AND the single instances of
LEARN-1 and LEARN-2 while the other one TM performs just the second parallel
instances of all the operators (no LEARN instances here).

Since LEARN is an heavy operator this lead to a very unbalanced load on the
cluster, so much that the first TM is killed during the execution (looking
at the logs it probably happens because it has not enough memory, in fact
the sink execution is very very slow, it seems like the LEARN is a
bottleneck).

Honestly I can't understand why Flink don't assign 1 LEARN operator to one
TM and the other one LEARN to the other one TM. 
This won't let me to stress the cluster properly because I will have always
one TM super busy and the other one quite "free" and unstressed.

Bye,
Andrea



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Unbalanced job scheduling

Reply via email to