Re: Dynamically repartitioned sources

Robert Metzger Fri, 11 Mar 2016 14:01:25 -0800

Hi Maxim,

you can implement a source for the system you are describing without
changing the parallelism of Flink. What you have to do is implement your
own data sources for Flink.
I would start by implementing the ParallelSourceFunction interface, where
each parallel source instance is reading from a subset of servers.


So basically one "flink partition" is reading from one or more partitions
of your system.


On Mon, Mar 7, 2016 at 9:18 PM, Maxim <mfat...@gmail.com> wrote:

> I'm looking at using Flink for a streaming project that has to use some
> internal systems as event sources. They are very similar to Kafka in their
> semantic. The data is partitioned and each partition can be replayed from a
> specified offset.
>
> The first system creates and deletes such partitions dynamically based on
> load. It provides an API to get list of partitions as well as their state
> (open, closed for append).
>
> The second system has a fixed set of a few thousand partitions, but they
> are allocated to a dynamic set of hosts and each host provides poll API
> that returns events from all partitions that currently reside on it. The
> metadata API that returns current mapping of partitions to hosts is
> provided.
>
> I found a thread
> <
> http://mail-archives.apache.org/mod_mbox/flink-user/201602.mbox/%3CCANjo42xgyUZAU=fmgGVFXVYMj7nVt67=3eJY=pwrc_nzdq-...@mail.gmail.com%3E
> >
> that
> mentioned that changing parallelism is one of the high priority items for
> this year. Has any work started on it? And would it support the type of
> dynamic sources we have?
>
> I could try adding such support myself if it would help to speed things up.
>
> Thanks,
>
> Maxim.
>

Re: Dynamically repartitioned sources

Reply via email to