Batch source improvement

Flavio Pompermaier Sat, 29 Apr 2017 02:36:49 -0700

Hi to all,
we're still using Flink as a batch processor and despite not very
advertised is still doing great.
However there's one thing I always wanted to ask: when reading data from a
source the job manager computes the splits and assigns a set of them to
every instance of the InputFormat. This works fine until the data is
pefectly balanced but in my experience most of the times this is not true
and some of them completes very quickly while some of them continue to read
data (also for a long time).


Couldn't this be enhanced buffering splits in a shared place so that tasks
could ask for a "free" split as soon as they complete to read their
assigned split? Would it be complicated to implement such a logic?

Best,
Flavio

Batch source improvement

Reply via email to