What would the differences be between these scenarios?

1) one task manager with numberOfTaskSlots=1 and one job with parallelism=1

2) one task manager with numberOfTaskSlots=10 and one job with
parallelism=10

In both cases all of the job's tasks get executed within the one task
manager's jvm. Are there any downsides to doing #2 instead of #1?

I ask this question because of current issues related to watermarks with
Kafka sources [1] [2] and changing parallelism with savepoints [3]. I'm
writing a Flink job that processes events from Kafka topics that have 12
partitions. I'm wondering if I should just set the job parallelism=12 and
make numberOfTaskSlots sum to 12 across however many task managers I set
up. It seems like watermarks would work properly then, and I could
effectively change job parallelism using the number of task managers (e.g.
1 TM with slots=12, or 2 TMs with slots=6, or 12 TMs with slots=1, etc).

Am I missing any important details that would make this a bad idea? It
seems like a bit of abuse of numberOfTaskSlots, but also seems like a
fairly simple solution to a few current issues.

Thanks,
Zach

[1]
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-partition-alignment-for-event-time-tt4782.html
[2] https://issues.apache.org/jira/browse/FLINK-3375
[3]
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Changing-parallelism-tt4967.html

Reply via email to