Interesting document, two questions: 1. Why JobService is runner specific? Couldn't at least a good part of it be reused given that the runner specific parts are mostly in the translation? or I am missing other reasons?
2. What about authentication and authorisation for production runners ? Once you can use such service to submit/cancel Pipelines is the first thing I can think of abusing. On Tue, May 22, 2018 at 9:40 PM Ankur Goenka <[email protected]> wrote: > Thank you guys for the input. > Here is the summary. > Responsibility of Beam on Job Management > Beam provide a common interface for basic job management operations called JobService. The supported operations can vary between runners. > What is JobService? > JobService is a runner specific component which implements Beams JobService interface defined here. > What is the life cycle of a JobService? > There are 3 scenarios > With ULR, JobService is short lived and runs as long as the ULR runs. ( JobService Lifespan ~= Job Lifespan ) > With Production runners ( Flink, Dataflow etc), JobService can either be short lived or long lived. The choice is up to the runner. > With Production runners ( Flink, Dataflow etc) without long running JobService, SDK will spin up a local JobService. > JobService state management > The choice of state management is up to JobService implementation. The basic requirement is that JobService should be able to perform all the operations with the returned job handle. > At the very least it can be the job handle for the underlying runner job and JobService will simply proxy actions to the runner using the provided job handle. > A persistent JobService is free to provide a simple string as a JobHandle. In this case, job handle can only be used with the same job service. > A stateless not persistent JobService can provide a opaque blob containing all the relevant information about the job. In this case the job handle can be used with any instance of JobService with the same code. > JobService code distribution and invocation when JobService is short lived > We will give an easy to run solution using docker. Docker will help in both executable distribution and providing platform independent binary. > We will also give an easy setup script with a supporting document for users who do not want to use docker on local machine. > Should Flink JobService start a local cluster for testing? > Flink JobService will be capable of submitting to a remote Flink cluster if an master url is provided else it will execute the pipeline in an inprocess Flink invocation on the same JVM. > On Tue, May 22, 2018 at 12:37 PM Eugene Kirpichov <[email protected]> wrote: >> Thanks Ankur, I think there's consensus, so it's probably ready to share :) >> On Fri, May 18, 2018 at 3:00 PM Ankur Goenka <[email protected]> wrote: >>> Thanks for all the input. >>> I have summarized the discussions at the bottom of the document ( here ). >>> Please feel free to provide comments. >>> Once we agree, I will publish the conclusion on the mailing list. >>> On Mon, May 14, 2018 at 1:51 PM Eugene Kirpichov <[email protected]> wrote: >>>> Thanks Ankur, this document clarifies a few points and raises some very important questions. I encourage everybody with a stake in Portability to take a look and chime in. >>>> +Aljoscha Krettek +Thomas Weise +Henning Rohde >>>> On Mon, May 14, 2018 at 12:34 PM Ankur Goenka <[email protected]> wrote: >>>>> Updated link to the document as the previous link was not working for some people. >>>>> On Fri, May 11, 2018 at 7:56 PM Ankur Goenka <[email protected]> wrote: >>>>>> Hi, >>>>>> Recent effort on portability has introduced JobService and ArtifactService to the beam stack along with SDK. This has open up a few questions around how we start a pipeline in a portable setup (with JobService). >>>>>> I am trying to document our approach to launching a portable pipeline and take binding decisions based on the discussion. >>>>>> Please review the document and provide your feedback. >>>>>> Thanks, >>>>>> Ankur
