Thanks! That's exactly the desired behavior. On Tue, Dec 29, 2015 at 6:10 AM, Jagadish Venkatraman < jagadish1...@gmail.com> wrote:
> Hi Michael, > > Similar keys in different topics will be routed to the same task instance > by default (Assuming that the keys are present in the same partition-id in > both topics - ie, the topics have the same # of partitions, and the topics > are keyed by the same key field). > > The default behavior is to group topic-partitions by partition_id. Please > refer the property* job.systemstreampartition.**grouper.factory* from > <http://goog_1566468382> > > http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html > < > http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html > > > . > > > > On Mon, Dec 28, 2015 at 9:19 AM, Michael Sklyar <mikesk...@gmail.com> > wrote: > > > Hi, > > > > > > I have a question regarding Kafka partitions while working with RocksDB > as > > an enrichment cache. > > > > > > We have a stream of URLs, a very simplified version would be: > > > > (1)URL(some 24 partitions)->(2)read enrichments task (from RocksDB) > > ->(3)make decision > > > > > > One of the enrichments is counter which should be accurate, to achieve it > > we partition the input Kafka topic (1) by key (therefore same URL will > > always arrive to the same task instance and the counter will be correct). > > > > For other enrichments (for example web title, google page rankā¦) we have > > other tasks that write to additional Kafka topics, also consumed by (2). > Is > > it possible to make sure that the same key in different kafka topics will > > reach the same Samza task instance? > > > > Other option, of course, would be to hold all the enrichments in all > > RocksDB instances. > > > > > > > > What do you think? What is the best practice? > > > > > > > > Thanks, > > > > Michael Sklyar > > > > > > -- > Jagadish V, > Graduate Student, > Department of Computer Science, > Stanford University >