I am not following what this ‘job.version’ + 1 business is about. I have build id for my code, likely a build timestamp or similar. I want to be able to determine if the currently running samza job has that version of the code. If not, I want to kill it, and launch a new job, with the same job name and job id.
In my tree, have modified the samza AM webapp endpoint to behave more like the map reduce endpoint, in that the REST api for the AM is located on the trackingUrl at /ws/v1/samza, rather than on the RPC port at root (i.e. /). I think this is a better strategy for three reasons: 1) it works with the existing ResourceManager REST api, which does not return the RPC port (only trackingUrl) 2) it uses the ResourceManager proxy mechanism for reaching the ApplicationMaster, arguably a tiny bit more secure than having to open network routes directly to the YARN nodes from the clients. 3) it has a versioning system built into the REST url, so that v2 (or other endpoints) can be enabled in the future. Richard > On Sep 20, 2015, at 7:52 PM, Navina Ramesh <nram...@linkedin.com.INVALID> > wrote: > > Hi Richard, > We assume that job versioning is controlled by how you manage the config > and deployment process. Job version is treated semantically different from > a job id. > > At LinkedIn, for example, we generate 2 tars - one for job and one for > config. When we deploy a Samza job, we specify the version of the job to > pickup (code) and the latest config is always picked up during the > deployment process. When the job is being deployed, if another version of > the same job is running (identified with the same job name and job id), we > kill it before submitting the new version job to the RM. > In this way, we distinguish between job version and job id. Job version is > merely the version of the code that is pulled in from the repository to > deploy. Whereas, job-id indicates an instance of a running job. > Hence, we can deploy multiple instances of the same code by changing the > job id. This way, each instance has its own checkpoint topic. > > I agree there is no straightforward way to deal with job versioning in > Samza or Yarn. I believe that is intentionally by design as we do no want > to couple version management with the naming of job related state (like > checkpointing or coordinator stream). Version management should be a layer > operating above or independent of the Samza framework or job. > > {quote} > In the newer Samza version with the Job coordinator, you can fix this > issue as following. When submitting a job, in the bootstrap phase that it > reads the coordinator stream you can find the currently maximum value for > the job.version and then rewrite this property with job.version+1. Then > later on you can read this property from the coordinator stream topic, or > from the job coordinator server providing the whole list of config. > {quote} > While this is a workable solution, I don't think this is what you are > looking for. You are again embedding job version within your job, which is > a bad idea. What happens if you have a bad version that you want to skip > over? job.version+1 won't work. Where will you embed the logic to skip > versions ? Or even revert to older versions? > > Imo, job version to deploy should be determined even before submitting a > job request to the RM. Any kind of validation or processing done after a > job starts is going to tightly couple the versioning system with the > framework. > > Cheers! > Navina > > On Sun, Sep 20, 2015 at 10:38 AM, Jocke Eriksson <jock...@gmail.com> wrote: > >> Maybe you could store the version number in kafka by publishing it to a >> topic with compaction mode. You could then consume the messages and do a >> version comparison. But for me such a simple task should >> be made easier. >> >> 2015-09-20 0:25 GMT+02:00 Richard Lee <rd...@tivo.com>: >> >>> Hi there- >>> >>> How do people track which version of a samza job is running in yarn? The >>> job name and job id can’t be used, as they are used to create the >>> checkpoint topic, etc. I’m looking for a way of determining if the >> current >>> job running in yarn is the latest version, and if not, kill it and >> launch a >>> newer version, picking up where the previous version left off. >>> >>> There seems to be no ‘job version’ field anywhere obvious in either samza >>> or yarn. >>> >>> Is there another approach I should use? >>> >>> Richard >>> >>> >> > > > > -- > Navina R.
smime.p7s
Description: S/MIME cryptographic signature