Re: job versioning

Abdollahian Noghabi, Shadi Tue, 22 Sep 2015 08:25:17 -0700

Using “job.version”+1, Samza would automatically assign job version. However, 
as Navina pointed out, this is probably not a good way to go since you might 
want to skip a version. Thus, I think you are right in this regard. Sorry if I 
made a confusion.



> On Sep 22, 2015, at 4:18 AM, Richard Lee <rd...@tivo.com> wrote:
> 
> I am not following what this ‘job.version’ + 1 business is about.  I have 
> build id for my code, likely a build timestamp or similar.  I want to be able 
> to determine if the currently running samza job has that version of the code. 
>  If not, I want to kill it, and launch a new job, with the same job name and 
> job id.
> 
> In my tree, have modified the samza AM webapp endpoint to behave more like 
> the map reduce endpoint, in that the REST api for the AM is located on the 
> trackingUrl at /ws/v1/samza, rather than on the RPC port at root (i.e. /).  I 
> think this is a better strategy for three reasons:
> 
> 1) it works with the existing ResourceManager REST api, which does not return 
> the RPC port (only trackingUrl)
> 2) it uses the ResourceManager proxy mechanism for reaching the 
> ApplicationMaster, arguably a tiny bit more secure than having to open 
> network routes directly to the YARN nodes from the clients.
> 3) it has a versioning system built into the REST url, so that v2 (or other 
> endpoints) can be enabled in the future.
> 
> Richard
> 
>> On Sep 20, 2015, at 7:52 PM, Navina Ramesh <nram...@linkedin.com.INVALID> 
>> wrote:
>> 
>> Hi Richard,
>> We assume that job versioning is controlled by how you manage the config
>> and deployment process. Job version is treated semantically different from
>> a job id.
>> 
>> At LinkedIn, for example, we generate 2 tars - one for job and one for
>> config. When we deploy a Samza job, we specify the version of the job to
>> pickup (code) and the latest config is always picked up during the
>> deployment process. When the job is being deployed, if another version of
>> the same job is running (identified with the same job name and job id), we
>> kill it before submitting the new version job to the RM.
>> In this way, we distinguish between job version and job id. Job version is
>> merely the version of the code that is pulled in from the repository to
>> deploy. Whereas, job-id indicates an instance of a running job.
>> Hence, we can deploy multiple instances of the same code by changing the
>> job id. This way, each instance has its own checkpoint topic.
>> 
>> I agree there is no straightforward way to deal with job versioning in
>> Samza or Yarn. I believe that is intentionally by design as we do no want
>> to couple version management with the naming of job related state (like
>> checkpointing or coordinator stream). Version management should be a layer
>> operating above or independent of the Samza framework or job.
>> 
>> {quote}
>> In the newer Samza version  with the Job coordinator, you can fix this
>> issue as following. When submitting a job, in the bootstrap phase that it
>> reads the coordinator stream you can find the currently maximum value for
>> the job.version and then rewrite this property with job.version+1. Then
>> later on you can read this property from the coordinator stream topic, or
>> from the job coordinator server providing the whole list of config.
>> {quote}
>> While this is a workable solution, I don't think this is what you are
>> looking for. You are again embedding job version within your job, which is
>> a bad idea. What happens if you have a bad version that you want to skip
>> over? job.version+1 won't work. Where will you embed the logic to skip
>> versions ? Or even revert to older versions?
>> 
>> Imo, job version to deploy should be determined even before submitting a
>> job request to the RM. Any kind of validation or processing done after a
>> job starts is going to tightly couple the versioning system with the
>> framework.
>> 
>> Cheers!
>> Navina
>> 
>> On Sun, Sep 20, 2015 at 10:38 AM, Jocke Eriksson <jock...@gmail.com> wrote:
>> 
>>> Maybe you could store the version number in kafka by publishing it to a
>>> topic with compaction mode. You could then consume the messages and do a
>>> version comparison. But for me such a simple task should
>>> be made easier.
>>> 
>>> 2015-09-20 0:25 GMT+02:00 Richard Lee <rd...@tivo.com>:
>>> 
>>>> Hi there-
>>>> 
>>>> How do people track which version of a samza job is running in yarn?  The
>>>> job name and job id can’t be used, as they are used to create the
>>>> checkpoint topic, etc.  I’m looking for a way of determining if the
>>> current
>>>> job running in yarn is the latest version, and if not, kill it and
>>> launch a
>>>> newer version, picking up where the previous version left off.
>>>> 
>>>> There seems to be no ‘job version’ field anywhere obvious in either samza
>>>> or yarn.
>>>> 
>>>> Is there another approach I should use?
>>>> 
>>>> Richard
>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> Navina R.
>

Re: job versioning

Reply via email to