Hey Mikel,

Another thing that you can do is store your keywords in a remote store, and
query them from the job. This comes with some operational overhead, but
some people put an in-memory cache in the job to reduce RPC calls.

Cheers,
Chris

On Fri, Feb 6, 2015 at 8:41 AM, Chris Riccomini <[email protected]>
wrote:

> Hey Mikel,
>
> This use case has been discussed in some detail here:
>
>   https://issues.apache.org/jira/browse/SAMZA-353
>
> Samza currently doesn't allow a single partition to be consumed by more
> than one task. You can, however, send the same message to multiple
> partitions. Within Samza, this can be achieved using this
> OutgoingMessageEnvelope constructor:
>
>   OutgoingMessageEnvelope(SystemStream systemStream, Object partitionKey,
> Object key, Object message)
>
> If you specify the partitionKey to be the partition #, you can send the
> same message to partition X and partition Y (in your example). Obviously,
> this isn't ideal (you have to write the same message N times), but it
> should work.
>
> Martin Kleppmann had a very similar use case, which he describes here:
>
>
> https://issues.apache.org/jira/browse/SAMZA-353?focusedCommentId=14205216&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14205216
>
> Cheers,
> Chris
>
> On Fri, Feb 6, 2015 at 5:00 AM, mikel roah <[email protected]> wrote:
>
>> Hello,
>>
>> I'm working on a project with Samza. The system receives streams of
>> messages and if a single message matches a set of keywords, it performs an
>> action on it (i.e. deliver it outside the system or update the internal
>> state). There is one job that performs the keyword matching and it should
>> scale in 2 ways:
>>
>>    - with the number of events
>>    - with the number of keywords
>>
>>
>> The first point is achieved by controlling the number of partitions and
>> containers. Instead the second one by splitting the set of keywords over
>> different tasks that run in containers like this:
>>
>>
>>
>>
>>
>> This design would allow to handle messages and split the matching job
>> over different tasks. How hard is to deliver the message to task 1 on
>> partition X and to task 4 on partition Y?
>>
>>
>> Thanks
>>
>>
>> Mikel
>>
>
>

Reply via email to