Hi Michael,

Similar keys in different topics will be routed to the same task instance
by default (Assuming that the keys are present in the same partition-id in
both topics - ie, the topics have the same # of partitions, and the topics
are keyed by the same key field).

The default behavior is to group topic-partitions by partition_id. Please
refer the property* job.systemstreampartition.**grouper.factory* from
<http://goog_1566468382>
http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html
<http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html>
 .



On Mon, Dec 28, 2015 at 9:19 AM, Michael Sklyar <mikesk...@gmail.com> wrote:

> Hi,
>
>
> I have a question regarding Kafka partitions while working with RocksDB as
> an enrichment cache.
>
>
> We have a stream of URLs, a very simplified version would be:
>
> (1)URL(some 24 partitions)->(2)read enrichments task (from RocksDB)
> ->(3)make decision
>
>
> One of the enrichments is counter which should be accurate, to achieve it
> we partition the input Kafka topic (1) by key (therefore same URL will
> always arrive to the same task instance and the counter will be correct).
>
> For other enrichments (for example web title, google page rank…) we have
> other tasks that write to additional Kafka topics, also consumed by (2). Is
> it possible to make sure that the same key in different kafka topics will
> reach the same Samza task instance?
>
> Other option, of course, would be to hold all the enrichments in all
> RocksDB instances.
>
>
>
> What do you think? What is the best practice?
>
>
>
> Thanks,
>
> Michael Sklyar
>



-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University

Reply via email to