Re: Dealing with partitioning mismatches between bootstrap and input streams

2015-04-08 Thread Tommy Becker
Thanks for the reply Chris. I got solution 1 more or less implemented while getting my bearings. I then started looking into solution 2 and made some progress, but now I'm starting to wonder how well the shared state store fits our particular use-case. As I mentioned, we need to use a bootst

Re: Dealing with partitioning mismatches between bootstrap and input streams

2015-04-07 Thread Chris Riccomini
Hey Tommy, Your summary sounds pretty accurate. One other way, which requires no change to Samza, would be to repartition the input topic properly for each task. This is kind of hacky, though. (2) is the ideal solution. It is a bit of work, but it might not be so bad. I think most of the changes

Dealing with partitioning mismatches between bootstrap and input streams

2015-04-07 Thread Tommy Becker
We have a Kafka topic containing data needed by several Samza jobs. These jobs will essentially read the data and build up state that will be used for processing their inputs. Ideally, we would use the topic as a bootstrap stream to build up this state. The problem with that is the topic contai