Hi there! First off, thanks for the continued work on Samza -- I looked into many DC/stream processors and Samza was a real standout with its smart architecture and pluggable design.
I'm working on a custom StreamJob/Factory for running Samza jobs on Kubernetes. Those two classes are pretty straight forward: I create a Deployment in Kubernetes with the appropriate number of Pods (a number <= the number of Kafka partitions I created the input topic with). Now I'm moving onto what executes in the actual Docker containers and I'm a bit confused. My plan was to mirror as much as possible what the YarnJob does which is setup an environment that will work with `run-jc.sh`. However, I don't need ClusterBasedJobCoordinator because Kubernetes is not an offer-based resource negotiator; if the JobCoordinator is running it means, by definition, it received the appropriate resources. So a PassThroughJobCoordinator with appropriate main() method seemed like the ticket. Unfortunately, the PTJC doesn't actually seem to *do* anything -- unlike the CBJC which has a run-loop and presumably schedules containers and the like. I saw the preview documentation on flexible deployment, but it didn't totally click for me. Perhaps because it was also my first introduction to the high-level API? (I've just been writing StreamTask impls) Here's a brief description of the workflow I'm envisioning, perhaps someone could tell me the classes I should implement and what sorts of containers I might need running in the environment to coordinate everything? 1. I create a topic in Kafka with N partitions 2. I start a job configured to run N-X containers 2.1. If my topic has 4 partitions and I have low load, I might want X to start at 3 so I only have 1 task instance 3. Samza is configured to send all partitions to task instance 1 4. Later, load increases. 4.1. I use Kubernetes to scale the job to 4 pods/containers 4.2. Samza re-configures such that the new containers receive work My intuition is that I need a JobCoordinator/Factory in the form of a server that sets up Watches on the appropriate Kubernetes resources so that when I perform the action in [4.1] *something* happens. Or perhaps I should use ZkJobCoordinator? Presumably as pods/containers come and go they will cause changes in ZK that will trigger Task restarts or whatever logic the coordinator employs? Okay, I'll stop rambling now. Thanks in advance for any tips! - Tom