Hi,

I have a question regarding Kafka partitions while working with RocksDB as
an enrichment cache.


We have a stream of URLs, a very simplified version would be:

(1)URL(some 24 partitions)->(2)read enrichments task (from RocksDB)
->(3)make decision


One of the enrichments is counter which should be accurate, to achieve it
we partition the input Kafka topic (1) by key (therefore same URL will
always arrive to the same task instance and the counter will be correct).

For other enrichments (for example web title, google page rank…) we have
other tasks that write to additional Kafka topics, also consumed by (2). Is
it possible to make sure that the same key in different kafka topics will
reach the same Samza task instance?

Other option, of course, would be to hold all the enrichments in all
RocksDB instances.



What do you think? What is the best practice?



Thanks,

Michael Sklyar

Reply via email to