Hi Jack!

Interesting use case. I think the answer depends a bit on how you want to
make your setup:

Variant 1:
-------------
If you have a YARN setup and allocate a new Flink AppMaster and dedicated
workers, you can probably scale to as many topologies as you want (or as
many as YARN can handle).


Variant 2:
--------------
If you want to let one (or a few) Flink clusters (deployed in YARN or
standalone) handle all the concurrent topologies (because they are short
lived, or because you want to minimize the overhead per topology), then it
becomes interesting

Resources of the master: Make sure you have a cap with respect to how many
jobs you run on one master node. The master keeps a representation of the
distributed dataflow, including metadata for the individual streams. For
high parallelism, that may require a bit of memory. So for large
parallelisms (100s and more parallel instances) I would not have more than
10 jobs per master. For lower parallelism, you can try more on one master
node. You can easily have multiple Flink clusters on one YARN cluster, that
way you should be able to scale to many concurrent topologies.

Resources on the workers: You can go for many small workers, or for few
large workers.

If you add many workers, that way you can handle many topologies and have
their operations isolated on JVM level. A worker itself can be rather
lightweight, it does not need too much heap space. For streaming, you need
to mainly make sure that there is enough network memory for streaming
shuffles, and if you want to make many very small workers you may want to
set the number of akka threads down to one.

If you go for fewer big workers that each run multiple operators, that is
probably more resource efficient, but will also not isolate operations on a
process level.

In any case, make sure you start the workers properly in streaming mode, it
has an effect on how the managed memory is pre-allocated.

Greetings,
Stephan


On Wed, Sep 23, 2015 at 12:24 PM, Jack <jack-kn...@marmelandia.com> wrote:

> Hello,
>
> I need to implement an engine for running a large number (several hundred)
> of different kinds of topologies. The topologies are defined ad-hoc by
> customers, hence combining the topologies is difficult.
>
> How does Flink perform with so many topologies?
>
> Thanks

Reply via email to