Re: job versioning

Navina Ramesh Sun, 20 Sep 2015 19:52:47 -0700

Hi Richard,
We assume that job versioning is controlled by how you manage the config
and deployment process. Job version is treated semantically different from
a job id.

At LinkedIn, for example, we generate 2 tars - one for job and one for
config. When we deploy a Samza job, we specify the version of the job to
pickup (code) and the latest config is always picked up during the
deployment process. When the job is being deployed, if another version of
the same job is running (identified with the same job name and job id), we
kill it before submitting the new version job to the RM.
In this way, we distinguish between job version and job id. Job version is
merely the version of the code that is pulled in from the repository to
deploy. Whereas, job-id indicates an instance of a running job.
Hence, we can deploy multiple instances of the same code by changing the
job id. This way, each instance has its own checkpoint topic.

I agree there is no straightforward way to deal with job versioning in
Samza or Yarn. I believe that is intentionally by design as we do no want
to couple version management with the naming of job related state (like
checkpointing or coordinator stream). Version management should be a layer
operating above or independent of the Samza framework or job.

{quote}
In the newer Samza version  with the Job coordinator, you can fix this
issue as following. When submitting a job, in the bootstrap phase that it
reads the coordinator stream you can find the currently maximum value for
the job.version and then rewrite this property with job.version+1. Then
later on you can read this property from the coordinator stream topic, or
from the job coordinator server providing the whole list of config.
{quote}
While this is a workable solution, I don't think this is what you are
looking for. You are again embedding job version within your job, which is
a bad idea. What happens if you have a bad version that you want to skip
over? job.version+1 won't work. Where will you embed the logic to skip
versions ? Or even revert to older versions?

Imo, job version to deploy should be determined even before submitting a
job request to the RM. Any kind of validation or processing done after a
job starts is going to tightly couple the versioning system with the
framework.

Cheers!
Navina

On Sun, Sep 20, 2015 at 10:38 AM, Jocke Eriksson <jock...@gmail.com> wrote:

> Maybe you could store the version number in kafka by publishing it to a
> topic with compaction mode. You could then consume the messages and do a
> version comparison. But for me such a simple task should
> be made easier.
>
> 2015-09-20 0:25 GMT+02:00 Richard Lee <rd...@tivo.com>:
>
> > Hi there-
> >
> > How do people track which version of a samza job is running in yarn?  The
> > job name and job id can’t be used, as they are used to create the
> > checkpoint topic, etc.  I’m looking for a way of determining if the
> current
> > job running in yarn is the latest version, and if not, kill it and
> launch a
> > newer version, picking up where the previous version left off.
> >
> > There seems to be no ‘job version’ field anywhere obvious in either samza
> > or yarn.
> >
> > Is there another approach I should use?
> >
> > Richard
> >
> >
>

-- 
Navina R.

Re: job versioning

Reply via email to