I'm looking at using Flink for a streaming project that has to use some internal systems as event sources. They are very similar to Kafka in their semantic. The data is partitioned and each partition can be replayed from a specified offset.
The first system creates and deletes such partitions dynamically based on load. It provides an API to get list of partitions as well as their state (open, closed for append). The second system has a fixed set of a few thousand partitions, but they are allocated to a dynamic set of hosts and each host provides poll API that returns events from all partitions that currently reside on it. The metadata API that returns current mapping of partitions to hosts is provided. I found a thread <http://mail-archives.apache.org/mod_mbox/flink-user/201602.mbox/%3CCANjo42xgyUZAU=fmgGVFXVYMj7nVt67=3eJY=pwrc_nzdq-...@mail.gmail.com%3E> that mentioned that changing parallelism is one of the high priority items for this year. Has any work started on it? And would it support the type of dynamic sources we have? I could try adding such support myself if it would help to speed things up. Thanks, Maxim.