Re: [DISCUSS] SEP-2: ApplicationRunner Design

Chris Pettitt Tue, 20 Jun 2017 09:46:57 -0700

Yi,

What examples should we be looking at for new-api-v2?


1. 
samza/samza-core/src/test/java/org/apache/samza/example/PageViewCounterStreamSpecExample.java

others?

- Chris

On Mon, Jun 19, 2017 at 5:29 PM, Yi Pan <nickpa...@gmail.com> wrote:
> Hi, all,
>
> Here is the promised code examples for the revised API, and the related
> change to how we specify serdes in the API:
>
> - User example for the new API chagne:
> https://github.com/nickpan47/samza/tree/new-api-v2
>
> - Prateek’s PR for the proposed schema registry change:
> https://github.com/nickpan47/samza/pull/2/files
>
> Please feel free to comment and provide feedbacks!
>
>
> Thanks!
>
>
> -Yi
>
> On Tue, Jun 6, 2017 at 11:16 AM, Yi Pan <nickpa...@gmail.com> wrote:
>
>> Hi, all,
>>
>> Thanks for all the inputs! Finally I got some time to go through the
>> discussion thread and digest most of the points made above. Here is my
>> personal summary:
>>
>> Consensus on requirements:
>>
>>    1. ApplicationRunner needs async APIs.
>>    2. ApplicationRunner can be hidden from user (except maybe in config)
>>    3. StreamApplication is the direct wrapper for the programming
>>    interface (i.e. removing StreamGraph from the user API and allow users to
>>    call input() and output() from the StreamApplication) in main()
>>    4. There has to be a serialization format of the StreamApplication
>>    itself, s.t. the tasks can just deserialize and create the user logic
>>    included in StreamApplication in multiple TaskContext.
>>    5. JobRunner seems to be a very thin layer on-top-of StreamProcessor
>>    or YarnJob, and it is always a LocalJob in a LocalApplitionRunner and a
>>    RemoteJob in a RemoteApplicationRunner. There is a desire to remove it
>>    6. StreamApplication needs to have some methods to allow user-injected
>>    global objects for the whole application, such as JmxServer,
>>    MetricsReporter, etc.
>>
>>
>> Some additional discussion points:
>>
>>    1. In StreamApplication#input()/output(), what should be the input /
>>    output parameter? The StreamSpec? Or the actual implementation I/O object
>>    to provide messages (i.e. similar to socket reader/file reader object)? In
>>    the later case, we will need to define an abstract layer of StreamReader
>>    and StreamWriter in the user-facing API that supports read/write of
>>    partitioned streams on top of the 
>> SystemConsumer/SystemProducer/SystemAdmin
>>    objects. Also, the number of I/O streams via the StreamReader/StreamWriter
>>    can not be pre-determined (i.e. depending on input stream partitions and
>>    the groupers). Hence, I am leaning toward to expose StreamSpec in the API
>>    and let user builds the StreamSpec via SpecBuilder. The actual I/O objects
>>    will be instantiated when SystemConsumer/SystemProducer are instantiated,
>>    with the number of physical partitions in each container.
>>    2. There is a need to support task-level programs via the same launch
>>    model as well.
>>
>>
>> Some ideas to implement the above requirements:
>>
>>    1. StreamGraph#write() should be used internally to generate and
>>    persist the serialized format of user logic. Then, StreamGraph#read()
>>    should give back a deserialized version of user logic. This would implies
>>    that the user functions defined in APIs are mandated to be serializable.
>>    2. StreamApplication should include a SpecBuilder provides the
>>    instantiation of MessageStream/Stores, which is passed to
>>    StreamApplication#input() / StreamApplication#output()
>>    3. StreamApplication should also include an internal ApplicationRunner
>>    instance (config driven, class loaded) to be able to switch between local
>>    vs remote execution
>>    4. Implementation of LocalApplicationRunner should directly
>>    instantiate and manage StreamProcessor instances for each job, removing 
>> the
>>    LocalJobRunnner
>>    5. Implementation of RemoteApplicationRunner should instantiate a
>>    remote JobFactory, create the remote job and submitted it for each job,
>>    removing the current JobRunner interface
>>    6. We also need a StreamTaskApplication class that allows user to
>>    create task-level applications, by mandate the constructor with a 
>> parameter
>>    of StreamTaskFactory
>>
>>
>> One more opinion around the status and the waitForFinish():  I would think
>> that waitForFinish() is just waiting for the local Runtime to complete, not
>> to wait for the remote job to be completed.
>>
>> I will be working on revision of SEP-2 and some example user code example
>> for now and will share it soon.
>>
>> Thanks!
>>
>> -Yi
>>
>> On Wed, May 3, 2017 at 8:08 AM, Chris Pettitt <
>> cpett...@linkedin.com.invalid> wrote:
>>
>>> Hi Xinyu,
>>>
>>> I took a second look at the registerStore API. Would it be possible to
>>> call
>>> register storeDirectly on the app, similar to what we're doing with
>>> app.input (possible with the restriction registerStore must be called
>>> before we add an operator that uses the store)? Otherwise we'll end up
>>> having to do two passes on the graph again - similar to the way we had to
>>> do a pass to init stream config and then hook up the graph.
>>>
>>> Thanks,
>>> Chris
>>>
>>>
>>> On Fri, Apr 28, 2017 at 8:55 PM, xinyu liu <xinyuliu...@gmail.com> wrote:
>>>
>>> > Right, option #2 seems redundant for defining streams after further
>>> > discussion here. StreamSpec itself is flexible enough to achieve both
>>> > static and programmatic specification of the stream. Agree it's not
>>> > convenient for now (pretty obvious after looking at your bsr
>>> > beam.runners.samza.wrapper), and we should provide similar predefined
>>> > convenient wrappers for user to create the StreamSpec. In your case
>>> > something like BoundedStreamSpec.file(....) which will generate the
>>> system
>>> > and serialize the data as you did.
>>> >
>>> > We're still thinking the callback proposed in #2 can be useful for
>>> > requirement #6: injecting other user objects in run time, such as stores
>>> > and metrics. To simplify the user understanding further, I think we
>>> might
>>> > hide the ApplicationRunner and expose the StreamApplication instead,
>>> which
>>> > will make requirement #3 not user facing. So the API becomes like:
>>> >
>>> >   StreamApplication app = StreamApplication.local(config)
>>> >     .init (env -> {
>>> >        env.registerStore("my-store", new MyStoreFactory());
>>> >        env.registerMetricsReporter("my-reporte", new
>>> > MyMetricsReporterFactory());
>>> >     })
>>> >     .withLifeCycleListener(myListener);
>>> >
>>> >   app.input(BoundedStreamSpec.create("/sample/input.txt"))
>>> >         .map(...)
>>> >         .window(...)
>>> >
>>> >   app.run();
>>> >
>>> > For requirement #5, I add a .withLifeCycleListener() in the API, which
>>> can
>>> > trigger the callbacks with life cycle events.
>>> >
>>> > For #4: distribution of the jars will be what we have today using the
>>> Yarn
>>> > localization with a remote store like artifactory or http server. We
>>> > discussed where to put the graph serialization. The current thinking is
>>> to
>>> > define a general interface which can backed by a remote store, like
>>> Kafka,
>>> > artifactory or http server. For Kafka, it's straightforward but we will
>>> > have the size limit or cut it by ourselves. For the other two, we need
>>> to
>>> > investigate whether we can easily upload jars to our artifactory and
>>> > localizing it with Yarn. Any opinions on this?
>>> >
>>> > Thanks,
>>> > Xinyu
>>> >
>>> > On Fri, Apr 28, 2017 at 11:34 AM, Chris Pettitt <
>>> > cpett...@linkedin.com.invalid> wrote:
>>> >
>>> > > Your proposal for #1 looks good.
>>> > >
>>> > > I'm not quite how to reconcile the proposals for #1 and #2. In #1 you
>>> add
>>> > > the stream spec straight onto the runner while in #2 you do it in a
>>> > > callback. If it is either-or, #1 looks a lot better for my purposes.
>>> > >
>>> > > For #4 what mechanism are you using to distribute the JARs? Can you
>>> use
>>> > the
>>> > > same mechanism to distribute the serialized graph?
>>> > >
>>> > > On Fri, Apr 28, 2017 at 12:14 AM, xinyu liu <xinyuliu...@gmail.com>
>>> > wrote:
>>> > >
>>> > > > btw, I will get to SAMZA-1246 as soon as possible.
>>> > > >
>>> > > > Thanks,
>>> > > > Xinyu
>>> > > >
>>> > > > On Thu, Apr 27, 2017 at 9:11 PM, xinyu liu <xinyuliu...@gmail.com>
>>> > > wrote:
>>> > > >
>>> > > > > Let me try to capture the updated requirements:
>>> > > > >
>>> > > > > 1. Set up input streams outside StreamGraph, and treat graph
>>> building
>>> > > as
>>> > > > a
>>> > > > > library (*Fluent, Beam*).
>>> > > > >
>>> > > > > 2. Improve ease of use for ApplicationRunner to avoid complex
>>> > > > > configurations such as zkCoordinator, zkCoordinationService.
>>> > > > (*Standalone*).
>>> > > > > Provide some programmatic way to tweak them in the API.
>>> > > > >
>>> > > > > 3. Clean up ApplicationRunner into a single interface (*Fluent*).
>>> We
>>> > > can
>>> > > > > have one or more implementations but it's hidden from the users.
>>> > > > >
>>> > > > > 4. Separate StreamGraph from runtime environment so it can be
>>> > > serialized
>>> > > > (*Beam,
>>> > > > > Yarn*)
>>> > > > >
>>> > > > > 5. Better life cycle management of application, parity with
>>> > > > > StreamProcessor (*Standalone, Beam*). Stats should include
>>> exception
>>> > in
>>> > > > > case of failure (tracked in SAMZA-1246).
>>> > > > >
>>> > > > > 6. Support injecting user-defined objects into ApplicationRunner.
>>> > > > >
>>> > > > > Prateek and I iterate on the ApplilcationRunner API based on these
>>> > > > > requirements. To support #1, we can set up input streams on the
>>> > runner
>>> > > > > level, which returns the MessageStream and allows graph building
>>> > > > > afterwards. The code looks like below:
>>> > > > >
>>> > > > >   ApplicationRunner runner = ApplicationRunner.local();
>>> > > > >   runner.input(streamSpec)
>>> > > > >             .map(..)
>>> > > > >             .window(...)
>>> > > > >   runner.run();
>>> > > > >
>>> > > > > StreamSpec is the building block for setting up streams here. It
>>> can
>>> > be
>>> > > > > set up in different ways:
>>> > > > >
>>> > > > >   - Direct creation of stream spec, like runner.input(new
>>> > > StreamSpec(id,
>>> > > > > system, stream))
>>> > > > >   - Load from streamId from env or config, like
>>> > > > runner.input(runner.env().
>>> > > > > getStreamSpec(id))
>>> > > > >   - Canned Spec which generates the StreamSpec with id, system and
>>> > > stream
>>> > > > > to minimize the configuration. For example,
>>> CollectionSpec.create(new
>>> > > > > ArrayList[]{1,2,3,4}), which will auto generate the system and
>>> stream
>>> > > in
>>> > > > > the spec.
>>> > > > >
>>> > > > > To support #2, we need to be able to set up StreamSpec-related
>>> > objects
>>> > > > and
>>> > > > > factories programmatically in env. Suppose we have the following
>>> > before
>>> > > > > runner.input(...):
>>> > > > >
>>> > > > >   runner.setup(env /* a writable interface of env*/ -> {
>>> > > > >     env.setStreamSpec(streamId, streamSpec);
>>> > > > >     env.setSystem(systemName, systemFactory);
>>> > > > >   })
>>> > > > >
>>> > > > > runner.setup(->) also provides setup for stores and other runtime
>>> > stuff
>>> > > > > needed for the execution. The setup should be able to serialized
>>> to
>>> > > > config.
>>> > > > > For #6, I haven't figured out a good way to inject user-defined
>>> > objects
>>> > > > > here yet.
>>> > > > >
>>> > > > > With this API, we should be able to also support #4. For remote
>>> > > > > runner.run(), the operator user classes/lamdas in the StreamGraph
>>> > need
>>> > > to
>>> > > > > be serialized. As today, the existing option is to serialize to a
>>> > > stream,
>>> > > > > either the coordinator stream or the pipeline control stream,
>>> which
>>> > > will
>>> > > > > have the size limit per message. Do you see RPC as an option?
>>> > > > >
>>> > > > > For this version of API, seems we don't need the StreamApplication
>>> > > > wrapper
>>> > > > > as well as exposing the StreamGraph. Do you think we are on the
>>> right
>>> > > > path?
>>> > > > >
>>> > > > > Thanks,
>>> > > > > Xinyu
>>> > > > >
>>> > > > >
>>> > > > > On Thu, Apr 27, 2017 at 6:09 AM, Chris Pettitt <
>>> > > > > cpett...@linkedin.com.invalid> wrote:
>>> > > > >
>>> > > > >> That should have been:
>>> > > > >>
>>> > > > >> For #1, Beam doesn't have a hard requirement...
>>> > > > >>
>>> > > > >> On Thu, Apr 27, 2017 at 9:07 AM, Chris Pettitt <
>>> > cpett...@linkedin.com
>>> > > >
>>> > > > >> wrote:
>>> > > > >>
>>> > > > >> > For #1, I doesn't have a hard requirement for any change from
>>> > > Samza. A
>>> > > > >> > very nice to have would be to allow the input systems to be
>>> set up
>>> > > at
>>> > > > >> the
>>> > > > >> > same time as the rest of the StreamGraph. An even nicer to have
>>> > > would
>>> > > > >> be to
>>> > > > >> > do away with the callback based approach and treat graph
>>> building
>>> > > as a
>>> > > > >> > library, a la Beam and Flink.
>>> > > > >> >
>>> > > > >> > For the moment I've worked around the two pass requirement
>>> (once
>>> > for
>>> > > > >> > config, once for StreamGraph) by introducing an IR layer
>>> between
>>> > > Beam
>>> > > > >> and
>>> > > > >> > the Samza Fluent translation. The IR layer is convenient
>>> > independent
>>> > > > of
>>> > > > >> > this problem because it makes it easier to switch between the
>>> > Fluent
>>> > > > and
>>> > > > >> > low-level APIs.
>>> > > > >> >
>>> > > > >> >
>>> > > > >> > For #4, if we had parity with StreamProcessor for lifecycle
>>> we'd
>>> > be
>>> > > in
>>> > > > >> > great shape. One additional issue with the status call that I
>>> may
>>> > > not
>>> > > > >> have
>>> > > > >> > mentioned is that it provides you no way to get at the cause of
>>> > > > failure.
>>> > > > >> > The StreamProcessor API does allow this via the callback.
>>> > > > >> >
>>> > > > >> >
>>> > > > >> > Re. #2 and #3, I'm a big fan of getting rid of the extra
>>> > > configuration
>>> > > > >> > indirection you currently have to jump through (this is also
>>> > related
>>> > > > to
>>> > > > >> > system consumer configuration from #1. It makes it much easier
>>> to
>>> > > > >> discover
>>> > > > >> > what the configurable parameters are too, if we provide some
>>> > > > >> programmatic
>>> > > > >> > way to tweak them in the API - which can turn into config under
>>> > the
>>> > > > >> hood.
>>> > > > >> >
>>> > > > >> > On Wed, Apr 26, 2017 at 9:20 PM, xinyu liu <
>>> xinyuliu...@gmail.com
>>> > >
>>> > > > >> wrote:
>>> > > > >> >
>>> > > > >> >> Let me give a shot to summarize the requirements for
>>> > > > ApplicationRunner
>>> > > > >> we
>>> > > > >> >> have discussed so far:
>>> > > > >> >>
>>> > > > >> >> - Support environment for passing in user-defined objects
>>> > (streams
>>> > > > >> >> potentially) into ApplicationRunner (*Beam*)
>>> > > > >> >>
>>> > > > >> >> - Improve ease of use for ApplicationRunner to avoid complex
>>> > > > >> >> configurations
>>> > > > >> >> such as zkCoordinator, zkCoordinationService. (*Standalone*)
>>> > > > >> >>
>>> > > > >> >> - Clean up ApplicationRunner into a single interface
>>> (*Fluent*).
>>> > We
>>> > > > can
>>> > > > >> >> have one or more implementations but it's hidden from the
>>> users.
>>> > > > >> >>
>>> > > > >> >> - Separate StreamGraph from environment so it can be
>>> serializable
>>> > > > >> (*Beam,
>>> > > > >> >> Yarn*)
>>> > > > >> >>
>>> > > > >> >> - Better life cycle management of application, including
>>> > > > >> >> start/stop/stats (*Standalone,
>>> > > > >> >> Beam*)
>>> > > > >> >>
>>> > > > >> >>
>>> > > > >> >> One way to address 2 and 3 is to provide pre-packaged runner
>>> > using
>>> > > > >> static
>>> > > > >> >> factory methods, and the return type will be the
>>> > ApplicationRunner
>>> > > > >> >> interface. So we can have:
>>> > > > >> >>
>>> > > > >> >>   ApplicationRunner runner = ApplicationRunner.zk() /
>>> > > > >> >> ApplicationRunner.local()
>>> > > > >> >> / ApplicationRunner.remote() / ApplicationRunner.test().
>>> > > > >> >>
>>> > > > >> >> Internally we will package the right configs and run-time
>>> > > environment
>>> > > > >> with
>>> > > > >> >> the runner. For example, ApplicationRunner.zk() will define
>>> all
>>> > the
>>> > > > >> >> configs
>>> > > > >> >> needed for zk coordination.
>>> > > > >> >>
>>> > > > >> >> To support 1 and 4, can we pass in a lambda function in the
>>> > runner,
>>> > > > and
>>> > > > >> >> then we can run the stream graph? Like the following:
>>> > > > >> >>
>>> > > > >> >>   ApplicationRunner.zk().env(config ->
>>> > > > environment).run(streamGraph);
>>> > > > >> >>
>>> > > > >> >> Then we need a way to pass the environment into the
>>> StreamGraph.
>>> > > This
>>> > > > >> can
>>> > > > >> >> be done by either adding an extra parameter to each operator,
>>> or
>>> > > > have a
>>> > > > >> >> getEnv() function in the MessageStream, which seems to be
>>> pretty
>>> > > > hacky.
>>> > > > >> >>
>>> > > > >> >> What do you think?
>>> > > > >> >>
>>> > > > >> >> Thanks,
>>> > > > >> >> Xinyu
>>> > > > >> >>
>>> > > > >> >>
>>> > > > >> >>
>>> > > > >> >>
>>> > > > >> >>
>>> > > > >> >> On Sun, Apr 23, 2017 at 11:01 PM, Prateek Maheshwari <
>>> > > > >> >> pmaheshw...@linkedin.com.invalid> wrote:
>>> > > > >> >>
>>> > > > >> >> > Thanks for putting this together Yi!
>>> > > > >> >> >
>>> > > > >> >> > I agree with Jake, it does seem like there are a few too
>>> many
>>> > > > moving
>>> > > > >> >> parts
>>> > > > >> >> > here. That said, the problem being solved is pretty broad,
>>> so
>>> > let
>>> > > > me
>>> > > > >> >> try to
>>> > > > >> >> > summarize my current understanding of the requirements.
>>> Please
>>> > > > >> correct
>>> > > > >> >> me
>>> > > > >> >> > if I'm wrong or missing something.
>>> > > > >> >> >
>>> > > > >> >> > ApplicationRunner and JobRunner first, ignoring test
>>> > environment
>>> > > > for
>>> > > > >> the
>>> > > > >> >> > moment.
>>> > > > >> >> > ApplicationRunner:
>>> > > > >> >> > 1. Create execution plan: Same in Standalone and Yarn
>>> > > > >> >> > 2. Create intermediate streams: Same logic but different
>>> leader
>>> > > > >> election
>>> > > > >> >> > (ZK-based or pre-configured in standalone, AM in Yarn).
>>> > > > >> >> > 3. Run jobs: In JVM in standalone. Submit to the cluster in
>>> > Yarn.
>>> > > > >> >> >
>>> > > > >> >> > JobRunner:
>>> > > > >> >> > 1. Run the StreamProcessors: Same process in Standalone &
>>> Test.
>>> > > > >> Remote
>>> > > > >> >> host
>>> > > > >> >> > in Yarn.
>>> > > > >> >> >
>>> > > > >> >> > To get a single ApplicationRunner implementation, like Jake
>>> > > > >> suggested,
>>> > > > >> >> we
>>> > > > >> >> > need to make leader election and JobRunner implementation
>>> > > > pluggable.
>>> > > > >> >> > There's still the question of whether ApplicationRunner#run
>>> API
>>> > > > >> should
>>> > > > >> >> be
>>> > > > >> >> > blocking or non-blocking. It has to be non-blocking in
>>> YARN. We
>>> > > > want
>>> > > > >> it
>>> > > > >> >> to
>>> > > > >> >> > be blocking in standalone, but seems like the main reason is
>>> > ease
>>> > > > of
>>> > > > >> use
>>> > > > >> >> > when launched from main(). I'd prefer making it consitently
>>> > > > >> non-blocking
>>> > > > >> >> > instead, esp. since in embedded standalone mode (where the
>>> > > > processor
>>> > > > >> is
>>> > > > >> >> > running in another container) a blocking API would not be
>>> > > > >> user-friendly
>>> > > > >> >> > either. If not, we can add both run and runBlocking.
>>> > > > >> >> >
>>> > > > >> >> > Coming to RuntimeEnvironment, which is the least clear to
>>> me so
>>> > > > far:
>>> > > > >> >> > 1. I don't think RuntimeEnvironment should be responsible
>>> for
>>> > > > >> providing
>>> > > > >> >> > StreamSpecs for streamIds - they can be obtained with a
>>> > > config/util
>>> > > > >> >> class.
>>> > > > >> >> > The StreamProcessor should only know about logical streamIds
>>> > and
>>> > > > the
>>> > > > >> >> > streamId <-> actual stream mapping should happen within the
>>> > > > >> >> > SystemProducer/Consumer/Admins provided by the
>>> > > RuntimeEnvironment.
>>> > > > >> >> > 2. There's also other components that the user might be
>>> > > interested
>>> > > > in
>>> > > > >> >> > providing implementations of in embedded Standalone mode
>>> (i.e.,
>>> > > not
>>> > > > >> >> just in
>>> > > > >> >> > tests) - MetricsRegistry and JMXServer come to mind.
>>> > > > >> >> > 3. Most importantly, it's not clear to me who creates and
>>> > manages
>>> > > > the
>>> > > > >> >> > RuntimeEnvironment. It seems like it should be the
>>> > > > ApplicationRunner
>>> > > > >> or
>>> > > > >> >> the
>>> > > > >> >> > user because of (2) above and because StreamManager also
>>> needs
>>> > > > >> access to
>>> > > > >> >> > SystemAdmins for creating intermediate streams which users
>>> > might
>>> > > > >> want to
>>> > > > >> >> > mock. But it also needs to be passed down to the
>>> > StreamProcessor
>>> > > -
>>> > > > >> how
>>> > > > >> >> > would this work on Yarn?
>>> > > > >> >> >
>>> > > > >> >> > I think we should figure out how to integrate
>>> > RuntimeEnvironment
>>> > > > with
>>> > > > >> >> > ApplicationRunner before we can make a call on one vs.
>>> multiple
>>> > > > >> >> > ApplicationRunner implementations. If we do keep
>>> > > > >> LocalApplicationRunner
>>> > > > >> >> and
>>> > > > >> >> > RemoteApplication (and TestApplicationRunner) separate,
>>> agree
>>> > > with
>>> > > > >> Jake
>>> > > > >> >> > that we should remove the JobRunners and roll them up into
>>> the
>>> > > > >> >> respective
>>> > > > >> >> > ApplicationRunners.
>>> > > > >> >> >
>>> > > > >> >> > - Prateek
>>> > > > >> >> >
>>> > > > >> >> > On Thu, Apr 20, 2017 at 10:06 AM, Jacob Maes <
>>> > > jacob.m...@gmail.com
>>> > > > >
>>> > > > >> >> wrote:
>>> > > > >> >> >
>>> > > > >> >> > > Thanks for the SEP!
>>> > > > >> >> > >
>>> > > > >> >> > > +1 on introducing these new components
>>> > > > >> >> > > -1 on the current definition of their roles (see Design
>>> > > feedback
>>> > > > >> >> below)
>>> > > > >> >> > >
>>> > > > >> >> > > *Design*
>>> > > > >> >> > >
>>> > > > >> >> > >    - If LocalJobRunner and RemoteJobRunner handle the
>>> > different
>>> > > > >> >> methods
>>> > > > >> >> > of
>>> > > > >> >> > >    launching a Job, what additional value do the different
>>> > > types
>>> > > > of
>>> > > > >> >> > >    ApplicationRunner and RuntimeEnvironment provide? It
>>> seems
>>> > > > like
>>> > > > >> a
>>> > > > >> >> red
>>> > > > >> >> > > flag
>>> > > > >> >> > >    that all 3 would need to change from environment to
>>> > > > >> environment. It
>>> > > > >> >> > >    indicates that they don't have proper modularity. The
>>> > > > >> >> > > call-sequence-figures
>>> > > > >> >> > >    support this; LocalApplicationRunner and
>>> > > > RemoteApplicationRunner
>>> > > > >> >> make
>>> > > > >> >> > > the
>>> > > > >> >> > >    same calls and the diagram only varies after
>>> > > jobRunner.start()
>>> > > > >> >> > >    - As far as I can tell, the only difference between
>>> Local
>>> > > and
>>> > > > >> >> Remote
>>> > > > >> >> > >    ApplicationRunner is that one is blocking and the
>>> other is
>>> > > > >> >> > > non-blocking. If
>>> > > > >> >> > >    that's all they're for then either the names should be
>>> > > changed
>>> > > > >> to
>>> > > > >> >> > > reflect
>>> > > > >> >> > >    this, or they should be combined into one
>>> > ApplicationRunner
>>> > > > and
>>> > > > >> >> just
>>> > > > >> >> > > expose
>>> > > > >> >> > >    separate methods for run() and runBlocking()
>>> > > > >> >> > >    - There isn't much detail on why the main() methods for
>>> > > > >> >> Local/Remote
>>> > > > >> >> > >    have such different implementations, how they receive
>>> the
>>> > > > >> >> Application
>>> > > > >> >> > >    (direct vs config), and concretely how the deployment
>>> > > scripts,
>>> > > > >> if
>>> > > > >> >> any,
>>> > > > >> >> > >    should interact with them.
>>> > > > >> >> > >
>>> > > > >> >> > >
>>> > > > >> >> > > *Style*
>>> > > > >> >> > >
>>> > > > >> >> > >    - nit: None of the 11 uses of the word "actual" in the
>>> doc
>>> > > are
>>> > > > >> >> > > *actually*
>>> > > > >> >> > >    needed. :-)
>>> > > > >> >> > >    - nit: Colors of the runtime blocks in the diagrams are
>>> > > > >> >> unconventional
>>> > > > >> >> > >    and a little distracting. Reminds me of nai won bao.
>>> Now
>>> > I'm
>>> > > > >> >> hungry.
>>> > > > >> >> > :-)
>>> > > > >> >> > >    - Prefer the name "ExecutionEnvironment" over
>>> > > > >> "RuntimeEnvironment".
>>> > > > >> >> > The
>>> > > > >> >> > >    term "execution environment" is used
>>> > > > >> >> > >    - The code comparisons for the ApplicationRunners are
>>> not
>>> > > > >> >> > apples-apples.
>>> > > > >> >> > >    The local runner example is an application that USES
>>> the
>>> > > local
>>> > > > >> >> runner.
>>> > > > >> >> > > The
>>> > > > >> >> > >    remote runner example is the just the runner code
>>> itself.
>>> > > So,
>>> > > > >> it's
>>> > > > >> >> not
>>> > > > >> >> > >    readily apparent that we're comparing the main()
>>> methods
>>> > and
>>> > > > not
>>> > > > >> >> the
>>> > > > >> >> > >    application itself.
>>> > > > >> >> > >
>>> > > > >> >> > >
>>> > > > >> >> > > On Mon, Apr 17, 2017 at 5:02 PM, Yi Pan <
>>> nickpa...@gmail.com
>>> > >
>>> > > > >> wrote:
>>> > > > >> >> > >
>>> > > > >> >> > > > Made some updates to clarify the role and functions of
>>> > > > >> >> > RuntimeEnvironment
>>> > > > >> >> > > > in SEP-2.
>>> > > > >> >> > > >
>>> > > > >> >> > > > On Fri, Apr 14, 2017 at 9:30 AM, Yi Pan <
>>> > nickpa...@gmail.com
>>> > > >
>>> > > > >> >> wrote:
>>> > > > >> >> > > >
>>> > > > >> >> > > > > Hi, everyone,
>>> > > > >> >> > > > >
>>> > > > >> >> > > > > In light of new features such as fluent API and
>>> > standalone
>>> > > > that
>>> > > > >> >> > > introduce
>>> > > > >> >> > > > > new deployment / application launch models in Samza, I
>>> > > > created
>>> > > > >> a
>>> > > > >> >> new
>>> > > > >> >> > > > SEP-2
>>> > > > >> >> > > > > to address the new use cases. SEP-2 link:
>>> > > > https://cwiki.apache
>>> > > > >> .
>>> > > > >> >> > > > > org/confluence/display/SAMZA/S
>>> EP-2%3A+ApplicationRunner+
>>> > > > Design
>>> > > > >> >> > > > >
>>> > > > >> >> > > > > Please take a look and give feedbacks!
>>> > > > >> >> > > > >
>>> > > > >> >> > > > > Thanks!
>>> > > > >> >> > > > >
>>> > > > >> >> > > > > -Yi
>>> > > > >> >> > > > >
>>> > > > >> >> > > >
>>> > > > >> >> > >
>>> > > > >> >> >
>>> > > > >> >>
>>> > > > >> >
>>> > > > >> >
>>> > > > >>
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>

Re: [DISCUSS] SEP-2: ApplicationRunner Design

Reply via email to