Thanks for writing this SEP! Here's an alternate approach instead of taking the "String processorId" as a parameter in the constructor. In my view, the "processorId" could be generated by the StreamProcessor internally (instead of being generated up-stream and passed in). The Job Coordinator API could be as follows:
public interface JobCoordinator { ProcessorIdGenerator getProcessorIDGenerator(); // could be String getProcessorID() JobModel getJobModel(); } public interface ProcessorIDGenerator { String getProcessorID(); } For instance, an Yarn job coordinator can merely parse the ID from config, and return it. A Zk backed implementation of the Job coordinator can agree on IDs using coordination leveraging Zk. One nice property with this approach is that it keeps all logic related to coordination, agreement on the Job Model, leader election (with potentially pluggable components for each) inside the JobCoordinator. To be clear, I'm all for pluggability for ID generation logic that this SEP advocates. I'm only wondering if this logic could instead reside inside the Job Coordinator (which is internal to the StreamProcessor) instead of relying on something external to it? Of course, there may be other considerations around the way the current code is structured that may prevent this. Let me know if you agree with this change. Thanks, Jag On Thu, Mar 16, 2017 at 5:21 PM, Navina Ramesh <nram...@linkedin.com.invalid > wrote: > > I am working on the ApplicationRunner SEP right now. Will send out the > discussion email once I am done. > > Perfect! :) > > On Thu, Mar 16, 2017 at 5:13 PM, xinyu liu <xinyuliu...@gmail.com> wrote: > > > Right, the static factory is very simple as you said. It's pretty > > convenient for the client to use. > > > > I am working on the ApplicationRunner SEP right now. Will send out the > > discussion email once I am done. > > > > Thanks, > > Xinyu > > > > On Thu, Mar 16, 2017 at 4:50 PM, Navina Ramesh (Apache) < > nav...@apache.org > > > > > wrote: > > > > > > One minor thing I found is that the name of the config is camel case > > > (*processor.idGenerator.class*). Seems Samza's practice is to use all > > > lower > > > case configs with "." delimiter. Do you think we should stick to this > > > convention? > > > > > > I am always torn between the "convention" we have and the better way of > > > doing things. But I don't have strong opinions about it. I can change > it. > > > > > > > One more suggestion is to have a static factory method in the > > > ProcessorIdGenerator (Like what we have in ApplicationRunner): > > > > > > I couldn't grasp these requirements from the ApplicationRunner design. > It > > > will be great if you can put it out in an SEP :) > > > > > > I can add the static factory method for it. Just to clarify, the static > > > method simply class loads the ProcessorIdGenerator ? It uses reflection > > to > > > create the instance ? > > > > > > Thanks! > > > Navina > > > > > > > > > > > > On Thu, Mar 16, 2017 at 4:31 PM, xinyu liu <xinyuliu...@gmail.com> > > wrote: > > > > > > > The proposal looks great to me! Changing the id type to string will > > make > > > > sure this can work with other types of cluster which doesn't support > > > > integer id. The interface and config provides a pluggable way to have > > > > different id generators for different use cases. One minor thing I > > found > > > is > > > > that the name of the config is camel case > > (*processor.idGenerator.class* > > > ). > > > > Seems Samza's practice is to use all lower case configs with "." > > > delimiter. > > > > Do you think we should stick to this convention? > > > > > > > > One more suggestion is to have a static factory method in > > > > the ProcessorIdGenerator (Like what we have in ApplicationRunner): > > > > > > > > static ProcessIdGenerator fromConfig(Config config) { ... }. > > > > > > > > With this, It will be more convenient for the ApplicationRunner to > > > > construct the generator. What do you think? > > > > > > > > Thanks, > > > > Xinyu > > > > > > > > On Wed, Mar 15, 2017 at 10:59 PM, Navina Ramesh (Apache) < > > > > nav...@apache.org> > > > > wrote: > > > > > > > > > Hi everyone, > > > > > I created a proposal for SAMZA-1126, which addresses the semantics > of > > > > > ProcessorId in Samza. For most purposes, ProcessorId is same as the > > > > logical > > > > > id that Samza assigns for each Yarn container. It is primarily used > > in > > > > > JobModel as a key for the corresponding ContainerModel and also, in > > > > > container-level metrics. We are expanding the applicability of > > > > processorId > > > > > to be beyond a fixed set of processors. > > > > > > > > > > Please review and comment on this SEP. > > > > > > > > > > For those who are not actively following the master branch, you may > > > have > > > > > more questions than others. Feel free to ask them here. > > > > > > > > > > @Xinyu: Since you are working on SAMZA-1067 and other related > > > integration > > > > > APIs, can you please add an SEP for SAMZA-1067 ? This will help > > others > > > > (adn > > > > > me as well) get on the same page with your design/code. Let me know > > if > > > > > SEP-1 will work per your design for ApplicationRunner. > > > > > > > > > > Thanks! > > > > > Navina > > > > > > > > > > > > > > > > > > -- > Navina R. > -- Jagadish V, Graduate Student, Department of Computer Science, Stanford University