Hi Jagadish, Thanks for the suggestion. You are right in that it should be the responsibility of the JobCoordinator to assign identifiers.
> 'm only wondering if this logic could instead reside inside the Job Coordinator (which is internal to the StreamProcessor) instead of relying on something external to it? I think this is a consequence of our initial StandaloneJobCoordinator, which is pretty much a pass-through. I didn't see any usage for getProcessorId() and was wondering why we put it in the JobCoordinator interface. I think I should keep your design proposal from last year handy :) Thanks for pitching in! @All: Yesterday, there was a discussion on naming of the configuration used in this SEP - whether it should be within the "job" scope or "app" scope (introduced by SAMZA-1041 <https://issues.apache.org/jira/browse/SAMZA-1041>). Multi-stage feature and fluent-api for Samza introduces the notion of "application". Since the processorId generator config applies to all jobs within a Samza application, we decided to add the config for generator under "app" scope. Further details on config scope changes can be found in SAMZA-1120. <https://issues.apache.org/jira/browse/SAMZA-1120> I will send out an update once I change the SEP based on yesterday's meeting and Jagadish's idea. Thanks! Navina On Mon, Mar 20, 2017 at 5:22 PM, Jagadish Venkatraman < jagadish1...@gmail.com> wrote: > Thanks for writing this SEP! > > Here's an alternate approach instead of taking the "String processorId" as > a parameter in the constructor. In my view, the "processorId" could be > generated by the StreamProcessor internally (instead of being generated > up-stream and passed in). The Job Coordinator API could be as follows: > > > public interface JobCoordinator { > > ProcessorIdGenerator getProcessorIDGenerator(); > > // could be String getProcessorID() > > JobModel getJobModel(); > > } > > public interface ProcessorIDGenerator { > > String getProcessorID(); > } > > > For instance, an Yarn job coordinator can merely parse the ID from config, > and return it. A Zk backed implementation of the Job coordinator can agree > on IDs using coordination leveraging Zk. One nice property with this > approach is that it keeps all logic related to coordination, agreement on > the Job Model, leader election (with potentially pluggable components for > each) inside the JobCoordinator. > > To be clear, I'm all for pluggability for ID generation logic that this SEP > advocates. I'm only wondering if this logic could instead reside inside the > Job Coordinator (which is internal to the StreamProcessor) instead of > relying on something external to it? > > Of course, there may be other considerations around the way the current > code is structured that may prevent this. Let me know if you agree with > this change. > > Thanks, > Jag > > > On Thu, Mar 16, 2017 at 5:21 PM, Navina Ramesh > <nram...@linkedin.com.invalid > > wrote: > > > > I am working on the ApplicationRunner SEP right now. Will send out the > > discussion email once I am done. > > > > Perfect! :) > > > > On Thu, Mar 16, 2017 at 5:13 PM, xinyu liu <xinyuliu...@gmail.com> > wrote: > > > > > Right, the static factory is very simple as you said. It's pretty > > > convenient for the client to use. > > > > > > I am working on the ApplicationRunner SEP right now. Will send out the > > > discussion email once I am done. > > > > > > Thanks, > > > Xinyu > > > > > > On Thu, Mar 16, 2017 at 4:50 PM, Navina Ramesh (Apache) < > > nav...@apache.org > > > > > > > wrote: > > > > > > > > One minor thing I found is that the name of the config is camel > case > > > > (*processor.idGenerator.class*). Seems Samza's practice is to use > all > > > > lower > > > > case configs with "." delimiter. Do you think we should stick to this > > > > convention? > > > > > > > > I am always torn between the "convention" we have and the better way > of > > > > doing things. But I don't have strong opinions about it. I can change > > it. > > > > > > > > > One more suggestion is to have a static factory method in the > > > > ProcessorIdGenerator (Like what we have in ApplicationRunner): > > > > > > > > I couldn't grasp these requirements from the ApplicationRunner > design. > > It > > > > will be great if you can put it out in an SEP :) > > > > > > > > I can add the static factory method for it. Just to clarify, the > static > > > > method simply class loads the ProcessorIdGenerator ? It uses > reflection > > > to > > > > create the instance ? > > > > > > > > Thanks! > > > > Navina > > > > > > > > > > > > > > > > On Thu, Mar 16, 2017 at 4:31 PM, xinyu liu <xinyuliu...@gmail.com> > > > wrote: > > > > > > > > > The proposal looks great to me! Changing the id type to string will > > > make > > > > > sure this can work with other types of cluster which doesn't > support > > > > > integer id. The interface and config provides a pluggable way to > have > > > > > different id generators for different use cases. One minor thing I > > > found > > > > is > > > > > that the name of the config is camel case > > > (*processor.idGenerator.class* > > > > ). > > > > > Seems Samza's practice is to use all lower case configs with "." > > > > delimiter. > > > > > Do you think we should stick to this convention? > > > > > > > > > > One more suggestion is to have a static factory method in > > > > > the ProcessorIdGenerator (Like what we have in ApplicationRunner): > > > > > > > > > > static ProcessIdGenerator fromConfig(Config config) { ... }. > > > > > > > > > > With this, It will be more convenient for the ApplicationRunner to > > > > > construct the generator. What do you think? > > > > > > > > > > Thanks, > > > > > Xinyu > > > > > > > > > > On Wed, Mar 15, 2017 at 10:59 PM, Navina Ramesh (Apache) < > > > > > nav...@apache.org> > > > > > wrote: > > > > > > > > > > > Hi everyone, > > > > > > I created a proposal for SAMZA-1126, which addresses the > semantics > > of > > > > > > ProcessorId in Samza. For most purposes, ProcessorId is same as > the > > > > > logical > > > > > > id that Samza assigns for each Yarn container. It is primarily > used > > > in > > > > > > JobModel as a key for the corresponding ContainerModel and also, > in > > > > > > container-level metrics. We are expanding the applicability of > > > > > processorId > > > > > > to be beyond a fixed set of processors. > > > > > > > > > > > > Please review and comment on this SEP. > > > > > > > > > > > > For those who are not actively following the master branch, you > may > > > > have > > > > > > more questions than others. Feel free to ask them here. > > > > > > > > > > > > @Xinyu: Since you are working on SAMZA-1067 and other related > > > > integration > > > > > > APIs, can you please add an SEP for SAMZA-1067 ? This will help > > > others > > > > > (adn > > > > > > me as well) get on the same page with your design/code. Let me > know > > > if > > > > > > SEP-1 will work per your design for ApplicationRunner. > > > > > > > > > > > > Thanks! > > > > > > Navina > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Navina R. > > > > > > -- > Jagadish V, > Graduate Student, > Department of Computer Science, > Stanford University >