How to scale a streaming Flink pipeline without abusing parallelism for long computation tasks?

Elkhan Dadashov Thu, 16 Apr 2020 01:39:08 -0700

Hi Flink users,

I have a basic Flnk pipeline, doing flatmap.


inside flatmap, I get the input, path it to the client library to compute
some result.

That library execution takes around 30 seconds to 2 minutes (depending on
the input ) for producing the output from the given input ( it is
time-series based long-running computation).

As it takes the library long time to compute, the input payloads keep
buffered, and if not given enough parallelism, the job will crash/restart.
(java.lang.RuntimeException: Buffer pool is destroyed.)

Wanted to check what are other options for scaling Flink streaming pipeline
without abusing parallelism for long-running computations in Flink operator?

Is multi-threading inside the operator recommended? ( even though the
single input computation takes a long time, but I can definitely run 4-8 of
them in parallel threads, instead of one by one, inside the same FlatMap
operator.

1 core for each yarn slot ( which will hold 1 flatmap operator) seems too
expensive. If we could launch more link operators with only 1 core, it
could have been easier.

If anyone faced a similar issue please share your experience. I'm using
Flink 1..6.3 version.

Thanks.

How to scale a streaming Flink pipeline without abusing parallelism for long computation tasks?

Reply via email to