Dynamically repartitioned sources

Maxim Mon, 07 Mar 2016 12:19:55 -0800

I'm looking at using Flink for a streaming project that has to use some
internal systems as event sources. They are very similar to Kafka in their
semantic. The data is partitioned and each partition can be replayed from a
specified offset.


The first system creates and deletes such partitions dynamically based on
load. It provides an API to get list of partitions as well as their state
(open, closed for append).

The second system has a fixed set of a few thousand partitions, but they
are allocated to a dynamic set of hosts and each host provides poll API
that returns events from all partitions that currently reside on it. The
metadata API that returns current mapping of partitions to hosts is
provided.

I found a thread
<http://mail-archives.apache.org/mod_mbox/flink-user/201602.mbox/%3CCANjo42xgyUZAU=fmgGVFXVYMj7nVt67=3eJY=pwrc_nzdq-...@mail.gmail.com%3E>
that
mentioned that changing parallelism is one of the high priority items for
this year. Has any work started on it? And would it support the type of
dynamic sources we have?

I could try adding such support myself if it would help to speed things up.

Thanks,

Maxim.

Dynamically repartitioned sources

Reply via email to