Hey Mikel, Another thing that you can do is store your keywords in a remote store, and query them from the job. This comes with some operational overhead, but some people put an in-memory cache in the job to reduce RPC calls.
Cheers, Chris On Fri, Feb 6, 2015 at 8:41 AM, Chris Riccomini <[email protected]> wrote: > Hey Mikel, > > This use case has been discussed in some detail here: > > https://issues.apache.org/jira/browse/SAMZA-353 > > Samza currently doesn't allow a single partition to be consumed by more > than one task. You can, however, send the same message to multiple > partitions. Within Samza, this can be achieved using this > OutgoingMessageEnvelope constructor: > > OutgoingMessageEnvelope(SystemStream systemStream, Object partitionKey, > Object key, Object message) > > If you specify the partitionKey to be the partition #, you can send the > same message to partition X and partition Y (in your example). Obviously, > this isn't ideal (you have to write the same message N times), but it > should work. > > Martin Kleppmann had a very similar use case, which he describes here: > > > https://issues.apache.org/jira/browse/SAMZA-353?focusedCommentId=14205216&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14205216 > > Cheers, > Chris > > On Fri, Feb 6, 2015 at 5:00 AM, mikel roah <[email protected]> wrote: > >> Hello, >> >> I'm working on a project with Samza. The system receives streams of >> messages and if a single message matches a set of keywords, it performs an >> action on it (i.e. deliver it outside the system or update the internal >> state). There is one job that performs the keyword matching and it should >> scale in 2 ways: >> >> - with the number of events >> - with the number of keywords >> >> >> The first point is achieved by controlling the number of partitions and >> containers. Instead the second one by splitting the set of keywords over >> different tasks that run in containers like this: >> >> >> >> >> >> This design would allow to handle messages and split the matching job >> over different tasks. How hard is to deliver the message to task 1 on >> partition X and to task 4 on partition Y? >> >> >> Thanks >> >> >> Mikel >> > >
