Hi Maxim, you can implement a source for the system you are describing without changing the parallelism of Flink. What you have to do is implement your own data sources for Flink. I would start by implementing the ParallelSourceFunction interface, where each parallel source instance is reading from a subset of servers.
So basically one "flink partition" is reading from one or more partitions of your system. On Mon, Mar 7, 2016 at 9:18 PM, Maxim <mfat...@gmail.com> wrote: > I'm looking at using Flink for a streaming project that has to use some > internal systems as event sources. They are very similar to Kafka in their > semantic. The data is partitioned and each partition can be replayed from a > specified offset. > > The first system creates and deletes such partitions dynamically based on > load. It provides an API to get list of partitions as well as their state > (open, closed for append). > > The second system has a fixed set of a few thousand partitions, but they > are allocated to a dynamic set of hosts and each host provides poll API > that returns events from all partitions that currently reside on it. The > metadata API that returns current mapping of partitions to hosts is > provided. > > I found a thread > < > http://mail-archives.apache.org/mod_mbox/flink-user/201602.mbox/%3CCANjo42xgyUZAU=fmgGVFXVYMj7nVt67=3eJY=pwrc_nzdq-...@mail.gmail.com%3E > > > that > mentioned that changing parallelism is one of the high priority items for > this year. Has any work started on it? And would it support the type of > dynamic sources we have? > > I could try adding such support myself if it would help to speed things up. > > Thanks, > > Maxim. >