Hey folks, A while ago I created SAMZA-1041 <https://issues.apache.org/jira/browse/SAMZA-1041> to add a multistage feature to Samza. The goal was to enable users to deploy a set of processors as a unit with the intermediate topics being created automatically. There are a number of use cases, including the repartitioner-main pattern and multistage HDFS jobs. Ultimately this will make it easier for users to deploy a DAG of Samza processors and reduce the common configuration pitfalls.
We've created a basic prototype and are ready to get started with this feature. A design is coming soon, but in the meantime, I started a couple of discussions in the comments to get some early feedback. Discussion 1 is asking for general feedback on the utility of this feature and any ideas to improve it. Discussion 2 is about the integration with the Fluent API feature, which also deals with data pipelines from a logical perspective. The goal is to make the distinction and contract between these features clear. Thanks in advance for the feedback! -Jake