Using “job.version”+1, Samza would automatically assign job version. However, as Navina pointed out, this is probably not a good way to go since you might want to skip a version. Thus, I think you are right in this regard. Sorry if I made a confusion.
> On Sep 22, 2015, at 4:18 AM, Richard Lee <rd...@tivo.com> wrote: > > I am not following what this ‘job.version’ + 1 business is about. I have > build id for my code, likely a build timestamp or similar. I want to be able > to determine if the currently running samza job has that version of the code. > If not, I want to kill it, and launch a new job, with the same job name and > job id. > > In my tree, have modified the samza AM webapp endpoint to behave more like > the map reduce endpoint, in that the REST api for the AM is located on the > trackingUrl at /ws/v1/samza, rather than on the RPC port at root (i.e. /). I > think this is a better strategy for three reasons: > > 1) it works with the existing ResourceManager REST api, which does not return > the RPC port (only trackingUrl) > 2) it uses the ResourceManager proxy mechanism for reaching the > ApplicationMaster, arguably a tiny bit more secure than having to open > network routes directly to the YARN nodes from the clients. > 3) it has a versioning system built into the REST url, so that v2 (or other > endpoints) can be enabled in the future. > > Richard > >> On Sep 20, 2015, at 7:52 PM, Navina Ramesh <nram...@linkedin.com.INVALID> >> wrote: >> >> Hi Richard, >> We assume that job versioning is controlled by how you manage the config >> and deployment process. Job version is treated semantically different from >> a job id. >> >> At LinkedIn, for example, we generate 2 tars - one for job and one for >> config. When we deploy a Samza job, we specify the version of the job to >> pickup (code) and the latest config is always picked up during the >> deployment process. When the job is being deployed, if another version of >> the same job is running (identified with the same job name and job id), we >> kill it before submitting the new version job to the RM. >> In this way, we distinguish between job version and job id. Job version is >> merely the version of the code that is pulled in from the repository to >> deploy. Whereas, job-id indicates an instance of a running job. >> Hence, we can deploy multiple instances of the same code by changing the >> job id. This way, each instance has its own checkpoint topic. >> >> I agree there is no straightforward way to deal with job versioning in >> Samza or Yarn. I believe that is intentionally by design as we do no want >> to couple version management with the naming of job related state (like >> checkpointing or coordinator stream). Version management should be a layer >> operating above or independent of the Samza framework or job. >> >> {quote} >> In the newer Samza version with the Job coordinator, you can fix this >> issue as following. When submitting a job, in the bootstrap phase that it >> reads the coordinator stream you can find the currently maximum value for >> the job.version and then rewrite this property with job.version+1. Then >> later on you can read this property from the coordinator stream topic, or >> from the job coordinator server providing the whole list of config. >> {quote} >> While this is a workable solution, I don't think this is what you are >> looking for. You are again embedding job version within your job, which is >> a bad idea. What happens if you have a bad version that you want to skip >> over? job.version+1 won't work. Where will you embed the logic to skip >> versions ? Or even revert to older versions? >> >> Imo, job version to deploy should be determined even before submitting a >> job request to the RM. Any kind of validation or processing done after a >> job starts is going to tightly couple the versioning system with the >> framework. >> >> Cheers! >> Navina >> >> On Sun, Sep 20, 2015 at 10:38 AM, Jocke Eriksson <jock...@gmail.com> wrote: >> >>> Maybe you could store the version number in kafka by publishing it to a >>> topic with compaction mode. You could then consume the messages and do a >>> version comparison. But for me such a simple task should >>> be made easier. >>> >>> 2015-09-20 0:25 GMT+02:00 Richard Lee <rd...@tivo.com>: >>> >>>> Hi there- >>>> >>>> How do people track which version of a samza job is running in yarn? The >>>> job name and job id can’t be used, as they are used to create the >>>> checkpoint topic, etc. I’m looking for a way of determining if the >>> current >>>> job running in yarn is the latest version, and if not, kill it and >>> launch a >>>> newer version, picking up where the previous version left off. >>>> >>>> There seems to be no ‘job version’ field anywhere obvious in either samza >>>> or yarn. >>>> >>>> Is there another approach I should use? >>>> >>>> Richard >>>> >>>> >>> >> >> >> >> -- >> Navina R. >