Hi Andrea, have you looked into assigning slot sharing groups [1]?
Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/datastream_api.html#task-chaining-and-resource-groups 2017-10-16 18:01 GMT+02:00 AndreaKinn <kinn6...@hotmail.it>: > Hi all, > I want to expose you my program flow. > > I have the following operators: > > kafka-source -> timestamp-extractor -> map -> keyBy -> window -> apply -> > LEARN -> SELECT -> process -> cassandra-sink > > the LEARN and SELECT operators belong to an external library supported by > flink. LEARN is a very heavy operation compared to the other operators. > > Unfortunately LEARN has a max parallelism of 1, so if I have a cluster of 2 > TM with 1 slot each and I set parallelism = 2 I will have one TM which > performs a parallel instances of all the operators and the single instance > of LEARN while the other one TM performs just the second parallel instances > of all the operators (clearly there are no more instance of LEARN). > That's ok and I have no problem with understanding it. > > *** The problem: > Actually I have 2 identical flows like this because it matches a situation > where I have two sensor streams so really I have 2 LEARN operators > corresponding to two independent streams. > > By the way I noted that even in this case I have one TM which take a load > of > the parallel instances of all the operators AND the single instances of > LEARN-1 and LEARN-2 while the other one TM performs just the second > parallel > instances of all the operators (no LEARN instances here). > > Since LEARN is an heavy operator this lead to a very unbalanced load on the > cluster, so much that the first TM is killed during the execution (looking > at the logs it probably happens because it has not enough memory, in fact > the sink execution is very very slow, it seems like the LEARN is a > bottleneck). > > Honestly I can't understand why Flink don't assign 1 LEARN operator to one > TM and the other one LEARN to the other one TM. > This won't let me to stress the cluster properly because I will have always > one TM super busy and the other one quite "free" and unstressed. > > Bye, > Andrea > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive.2336050. > n4.nabble.com/ >