Hi Jagadish,
Thanks for the suggestion. You are right in that it should be the
responsibility of the JobCoordinator to assign identifiers.

> 'm only wondering if this logic could instead reside inside the
Job Coordinator (which is internal to the StreamProcessor) instead of
relying on something external to it?

I think this is a consequence of our initial StandaloneJobCoordinator,
which is pretty much a pass-through. I didn't see any usage for
getProcessorId() and was wondering why we put it in the JobCoordinator
interface. I think I should keep your design proposal from last year handy
:) Thanks for pitching in!


@All:
Yesterday, there was a discussion on naming of the configuration used in
this SEP - whether it should be within the "job" scope or "app" scope
(introduced by SAMZA-1041 <https://issues.apache.org/jira/browse/SAMZA-1041>).
Multi-stage feature and fluent-api for Samza introduces the notion of
"application". Since the processorId generator config applies to all jobs
within a Samza application, we decided to add the config for generator
under "app" scope. Further details on config scope changes can be found in
SAMZA-1120. <https://issues.apache.org/jira/browse/SAMZA-1120>

I will send out an update once I change the SEP based on yesterday's
meeting and Jagadish's idea.

Thanks!
Navina

On Mon, Mar 20, 2017 at 5:22 PM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> Thanks for writing this SEP!
>
> Here's an alternate approach instead of taking the "String processorId" as
> a parameter in the constructor. In my view, the "processorId" could be
> generated by the StreamProcessor internally (instead of being generated
> up-stream and passed in). The Job Coordinator API could be as follows:
>
>
> public interface JobCoordinator {
>
>  ProcessorIdGenerator getProcessorIDGenerator();
>
> // could be String getProcessorID()
>
>  JobModel getJobModel();
>
> }
>
> public interface ProcessorIDGenerator {
>
>  String getProcessorID();
> }
>
>
> For instance, an Yarn job coordinator can merely parse the ID from config,
> and return it. A Zk backed implementation of the Job coordinator can agree
> on IDs using coordination leveraging Zk. One nice property with this
> approach is that it keeps all logic related to coordination, agreement on
> the Job Model, leader election (with potentially pluggable components for
> each) inside the JobCoordinator.
>
> To be clear, I'm all for pluggability for ID generation logic that this SEP
> advocates. I'm only wondering if this logic could instead reside inside the
> Job Coordinator (which is internal to the StreamProcessor) instead of
> relying on something external to it?
>
> Of course, there may be other considerations around the way the current
> code is structured that may prevent this. Let me know if you agree with
> this change.
>
> Thanks,
> Jag
>
>
> On Thu, Mar 16, 2017 at 5:21 PM, Navina Ramesh
> <nram...@linkedin.com.invalid
> > wrote:
>
> > > I am working on the ApplicationRunner SEP right now. Will send out the
> > discussion email once I am done.
> >
> > Perfect! :)
> >
> > On Thu, Mar 16, 2017 at 5:13 PM, xinyu liu <xinyuliu...@gmail.com>
> wrote:
> >
> > > Right, the static factory is very simple as you said. It's pretty
> > > convenient for the client to use.
> > >
> > > I am working on the ApplicationRunner SEP right now. Will send out the
> > > discussion email once I am done.
> > >
> > > Thanks,
> > > Xinyu
> > >
> > > On Thu, Mar 16, 2017 at 4:50 PM, Navina Ramesh (Apache) <
> > nav...@apache.org
> > > >
> > > wrote:
> > >
> > > > > One minor thing I found is that the name of the config is camel
> case
> > > > (*processor.idGenerator.class*). Seems Samza's practice is to use
> all
> > > > lower
> > > > case configs with "." delimiter. Do you think we should stick to this
> > > > convention?
> > > >
> > > > I am always torn between the "convention" we have and the better way
> of
> > > > doing things. But I don't have strong opinions about it. I can change
> > it.
> > > >
> > > > > One more suggestion is to have a static factory method in the
> > > > ProcessorIdGenerator (Like what we have in ApplicationRunner):
> > > >
> > > > I couldn't grasp these requirements from the ApplicationRunner
> design.
> > It
> > > > will be great if you can put it out in an SEP :)
> > > >
> > > > I can add the static factory method for it. Just to clarify, the
> static
> > > > method simply class loads the ProcessorIdGenerator ? It uses
> reflection
> > > to
> > > > create the instance ?
> > > >
> > > > Thanks!
> > > > Navina
> > > >
> > > >
> > > >
> > > > On Thu, Mar 16, 2017 at 4:31 PM, xinyu liu <xinyuliu...@gmail.com>
> > > wrote:
> > > >
> > > > > The proposal looks great to me! Changing the id type to string will
> > > make
> > > > > sure this can work with other types of cluster which doesn't
> support
> > > > > integer id. The interface and config provides a pluggable way to
> have
> > > > > different id generators for different use cases. One minor thing I
> > > found
> > > > is
> > > > > that the name of the config is camel case
> > > (*processor.idGenerator.class*
> > > > ).
> > > > > Seems Samza's practice is to use all lower case configs with "."
> > > > delimiter.
> > > > > Do you think we should stick to this convention?
> > > > >
> > > > > One more suggestion is to have a static factory method in
> > > > > the ProcessorIdGenerator (Like what we have in ApplicationRunner):
> > > > >
> > > > > static ProcessIdGenerator fromConfig(Config config) { ... }.
> > > > >
> > > > > With this, It will be more convenient for the ApplicationRunner to
> > > > > construct the generator. What do you think?
> > > > >
> > > > > Thanks,
> > > > > Xinyu
> > > > >
> > > > > On Wed, Mar 15, 2017 at 10:59 PM, Navina Ramesh (Apache) <
> > > > > nav...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > > I created a proposal for SAMZA-1126, which addresses the
> semantics
> > of
> > > > > > ProcessorId in Samza. For most purposes, ProcessorId is same as
> the
> > > > > logical
> > > > > > id that Samza assigns for each Yarn container. It is primarily
> used
> > > in
> > > > > > JobModel as a key for the corresponding ContainerModel and also,
> in
> > > > > > container-level metrics. We are expanding the applicability of
> > > > > processorId
> > > > > > to be beyond a fixed set of processors.
> > > > > >
> > > > > > Please review and comment on this SEP.
> > > > > >
> > > > > > For those who are not actively following the master branch, you
> may
> > > > have
> > > > > > more questions than others. Feel free to ask them here.
> > > > > >
> > > > > > @Xinyu: Since you are working on SAMZA-1067 and other related
> > > > integration
> > > > > > APIs, can you please add an SEP for SAMZA-1067 ? This will help
> > > others
> > > > > (adn
> > > > > > me as well) get on the same page with your design/code. Let me
> know
> > > if
> > > > > > SEP-1 will work per your design for ApplicationRunner.
> > > > > >
> > > > > > Thanks!
> > > > > > Navina
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Navina R.
> >
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>

Reply via email to