
I was wondering how to properly use colocation groups (if applicable) to
achieve the required functionality in the following two simple contrived
use-cases (focusing on the essence of the problem), both of which aim to be
executed on a multi-node cluster (2 or more slaves and a master), with 4
(or more) task slots each:

Use-case 1:

- I have a stream of words, a mapping function that performs some
computation for each word and several slaves in a Flink cluster.
- I would like words starting with the same letter to be routed to the same


Use-case 2:

- I have a stream of words, a mapping function that performs some
computation for each word and several slaves in a Flink cluster.
- Not all slaves can process all words, and which slaves can process which
words only becomes known at runtime (e.g. through a configuration file in
the slaves' local filesystem).
- How can I achieve exactly-once processing of each word in this setting?

I understand that using a shared store (HDFS/flink shared variable) to
contain this config information may be one approach, we are investigating
if there is an alternative (possibly more elegant) solution using flink's
capabilities whilst retaining locality of config files.


Thank you in advance for your time and help provided

Konstantinos Barmpis | Research Associate
White Rose Grid Enterprise Systems Group
Dept. of Computer Science
University of York
Tel: +44 (0) 1904-32 5653

Email Disclaimer:

Reply via email to