Hi Elias,

Thank you so much for your explanation. I'm looking into improving Samza
Standalone and your design-inputs were very useful.

Cheers,
Jagadish

On Mon, Dec 14, 2015 at 12:47 PM, Elias Levy <fearsome.lucid...@gmail.com>
wrote:

> On Mon, Dec 14, 2015 at 11:55 AM, Jagadish Venkatraman <
> jagadish1...@gmail.com> wrote:
>
> >
> > Thanks for the great work! This is super-helpful. Another cool feature is
> > that this implementation pushes alot of failure handling/restarting to
> the
> > cluster manager.
> >
> > I've some questions on the implementation.
> >
> > 1. It's my understanding that each container is a separate pod. When a
> pod
> > crashes, I assume that the *Kubelet* would bring back the job on the
> > **same**
> > node? (and we can re-use state since the volume is mounted as an
> *emptyDir*
> > volume) This is valuable for statefull jobs.
> >
>
> Correct.  I create an emptyDir state volume on purpose, so that if the
> Docker container dies for some reason, the local state won't go away and
> will be available to the Samza container when it is restarted by Kubelet.
> I do the same thing for the logs, so you can fetch them and debug any
> issues that may be causing the container to fail.
>
>
>
> > 2. Also, the containerId and the jobModel are a part of the env. variable
> > passed to the pod. How do we ensure that - after a restart, the pods that
> > come up (potentially on a different host), have the same containerId?
> (Does
> > Kubernetes help in this by launching the process with the same env.
> > variables as the original run? )
> >
> > This behavior is key to ensure that we process all partitions with no 2
> > containers processing the same partition.
> >
>
> At the moment Kubernetes does not have a specific abstraction to model
> stateful services that have unique identities within the cluster (e.g. a
> Samza job, a Kafka cluster, a ZooKeeper ensemble), but it is easy to work
> past this limitation.  You do so by creating one ReplicationController with
> a replica count of one per group member.  In the case of a Samza job, my
> code creates a RC per SamzaContainer.  Each RC has a slightly different Pod
> spec.  Each Pod is parametrized a unique Samza container ID via the
> SAMZA_CONTAINER_ID environment variable.  So the Kubernetes YAML the code
> outputs ensures there is only one RC per container ID, and the RC ensures
> there is only ever a single container running for its given configuration.
>
> Makes sense?
>



-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University

Reply via email to