Hello,I'm working on a project with Samza. The system receives streams of
messages and if a single message matches a set of keywords, it performs an
action on it (i.e. deliver it outside the system or update the internal state).
There is one job that performs the keyword matching and it should scale in 2
ways:with the number of eventswith the number of keywordsThe first point is
achieved by controlling the number of partitions and containers. Instead the
second one by splitting the set of keywords over different tasks that run in
containers like this:
This design would allow to handle messages and split the matching
job over different tasks. How hard is to deliver the message to task 1 on
partition X and to task 4 on partition Y?ThanksMikel