Hi Michael, Similar keys in different topics will be routed to the same task instance by default (Assuming that the keys are present in the same partition-id in both topics - ie, the topics have the same # of partitions, and the topics are keyed by the same key field).
The default behavior is to group topic-partitions by partition_id. Please refer the property* job.systemstreampartition.**grouper.factory* from <http://goog_1566468382> http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html <http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html> . On Mon, Dec 28, 2015 at 9:19 AM, Michael Sklyar <mikesk...@gmail.com> wrote: > Hi, > > > I have a question regarding Kafka partitions while working with RocksDB as > an enrichment cache. > > > We have a stream of URLs, a very simplified version would be: > > (1)URL(some 24 partitions)->(2)read enrichments task (from RocksDB) > ->(3)make decision > > > One of the enrichments is counter which should be accurate, to achieve it > we partition the input Kafka topic (1) by key (therefore same URL will > always arrive to the same task instance and the counter will be correct). > > For other enrichments (for example web title, google page rankā¦) we have > other tasks that write to additional Kafka topics, also consumed by (2). Is > it possible to make sure that the same key in different kafka topics will > reach the same Samza task instance? > > Other option, of course, would be to hold all the enrichments in all > RocksDB instances. > > > > What do you think? What is the best practice? > > > > Thanks, > > Michael Sklyar > -- Jagadish V, Graduate Student, Department of Computer Science, Stanford University