subject:"Re\: Local state in Samza \- sharing data between tasks"

Re: Local state in Samza - sharing data between tasks

2015-05-08 Thread Andreas Simanowski

Yi thanks for the input. I've been out sick, so please excuse the delayed response. I am still working out the use case with my team and will report back next week. Thanks! On Tue, May 5, 2015 at 4:01 PM, Yi Pan wrote: > Hi, Andreas, > > Are you describing a use case where the *same* copy of da

Re: Local state in Samza - sharing data between tasks

2015-05-05 Thread Yi Pan

Hi, Andreas, Are you describing a use case where the *same* copy of data is shared among all tasks? That will depend on a lot factors: 1. is your data size huge? 2. Can your data be partitioned to work with a single partition of input stream? 3. Do you have a means to bootstrap the data from a str

Re: Local state in Samza - sharing data between tasks

2015-05-05 Thread Andreas Simanowski

Hi Yan, thanks for the reply. So yes, you are correct it would not be random which partition a message hits. We would use a partition key (sorry I missed that). The "data" I was referring to is the local KV-store data for each task. Is there a way to synchronize or replicate the data from the KV-

Re: Local state in Samza - sharing data between tasks

2015-05-05 Thread Yan Fang

Hi Andreas, Not quite understand this part "Because the messages coming into the input stream are random (i.e. can hit any partition and therefore any task), each task will need its own copy of the data (i.e. the data needs to be duplicated across each task)." Messages come into the input stream