Thanks!
That's exactly the desired behavior.

On Tue, Dec 29, 2015 at 6:10 AM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> Hi Michael,
>
> Similar keys in different topics will be routed to the same task instance
> by default (Assuming that the keys are present in the same partition-id in
> both topics - ie, the topics have the same # of partitions, and the topics
> are keyed by the same key field).
>
> The default behavior is to group topic-partitions by partition_id. Please
> refer the property* job.systemstreampartition.**grouper.factory* from
> <http://goog_1566468382>
>
> http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html
> <
> http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html
> >
>  .
>
>
>
> On Mon, Dec 28, 2015 at 9:19 AM, Michael Sklyar <mikesk...@gmail.com>
> wrote:
>
> > Hi,
> >
> >
> > I have a question regarding Kafka partitions while working with RocksDB
> as
> > an enrichment cache.
> >
> >
> > We have a stream of URLs, a very simplified version would be:
> >
> > (1)URL(some 24 partitions)->(2)read enrichments task (from RocksDB)
> > ->(3)make decision
> >
> >
> > One of the enrichments is counter which should be accurate, to achieve it
> > we partition the input Kafka topic (1) by key (therefore same URL will
> > always arrive to the same task instance and the counter will be correct).
> >
> > For other enrichments (for example web title, google page rank…) we have
> > other tasks that write to additional Kafka topics, also consumed by (2).
> Is
> > it possible to make sure that the same key in different kafka topics will
> > reach the same Samza task instance?
> >
> > Other option, of course, would be to hold all the enrichments in all
> > RocksDB instances.
> >
> >
> >
> > What do you think? What is the best practice?
> >
> >
> >
> > Thanks,
> >
> > Michael Sklyar
> >
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>

Reply via email to