We have a Kafka topic containing data needed by several Samza jobs. These jobs 
will essentially read the data and build up state that will be used for 
processing their inputs. Ideally, we would use the topic as a bootstrap stream 
to build up this state. The problem with that is the topic containing the data 
has a single partition but the topics these jobs are processing as input have 
multiple partitions. So my understanding is that only one task instance in the 
job would actually process the bootstrap stream, and therefore any state it 
built up would be local to that task. So I'm thinking my options are the 
following:

1) Implement SAMZA-353 and allow the bootstrap SSP to be assigned to each task 
instance
2) Implement the shared state store component of SAMZA-402
3) Layer the shared state on top of Samza in our tasks themselves, maybe by 
using something like RocksDB directly.

Number 1 seems easiest to implement at the cost of having the entire state 
duplicated for each task.  I'd prefer not to do number 3 given the existence of 
this feature on Samza's roadmap, but I am a bit concerned about the scope of 
work with number 2, and the fact that this is mostly Scala code.

Are there any alternatives that I'm missing?  Note that we need to process the 
data stream as a bootstrap stream.  Using it as a changelog is insufficient 
because we need to be able to manipulate the data before building up the state 
store.

--
Tommy Becker
Senior Software Engineer

Digitalsmiths
A TiVo Company

www.digitalsmiths.com<http://www.digitalsmiths.com>
tobec...@tivo.com<mailto:tobec...@tivo.com>

________________________________

This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.

Reply via email to