[DISCUSS] SAMZA-1041 Multi-stage feature for Samza

Jake Maes Tue, 13 Dec 2016 13:55:26 -0800

Hey folks,

A while ago I created SAMZA-1041
<https://issues.apache.org/jira/browse/SAMZA-1041> to add a multistage
feature to Samza. The goal was to enable users to deploy a set of
processors as a unit with the intermediate topics being created
automatically. There are a number of use cases, including the
repartitioner-main pattern and multistage HDFS jobs. Ultimately this will
make it easier for users to deploy a DAG of Samza processors and reduce the
common configuration pitfalls.


We've created a basic prototype and are ready to get started with this
feature. A design is coming soon, but in the meantime, I started a couple
of discussions in the comments to get some early feedback.

Discussion 1 is asking for general feedback on the utility of this feature
and any ideas to improve it.

Discussion 2 is about the integration with the Fluent API feature, which
also deals with data pipelines from a logical perspective. The goal is to
make the distinction and contract between these features clear.

Thanks in advance for the feedback!

-Jake

[DISCUSS] SAMZA-1041 Multi-stage feature for Samza

Reply via email to