Hi everyone, I have updated the SEP <https://cwiki.apache.org/confluence/display/SAMZA/SEP-1%3A+Semantics+of+ProcessorId+in+Samza> based on all the feedback. Feel free to comment.
I will start the [vote] mail thread, if there are no further questions within the next 24 hours. Thanks! Navina On Tue, Mar 21, 2017 at 10:33 AM, Navina Ramesh (Apache) <nav...@apache.org> wrote: > Hi Jagadish, > Thanks for the suggestion. You are right in that it should be the > responsibility of the JobCoordinator to assign identifiers. > > > 'm only wondering if this logic could instead reside inside the > Job Coordinator (which is internal to the StreamProcessor) instead of > relying on something external to it? > > I think this is a consequence of our initial StandaloneJobCoordinator, > which is pretty much a pass-through. I didn't see any usage for > getProcessorId() and was wondering why we put it in the JobCoordinator > interface. I think I should keep your design proposal from last year handy > :) Thanks for pitching in! > > > @All: > Yesterday, there was a discussion on naming of the configuration used in > this SEP - whether it should be within the "job" scope or "app" scope > (introduced by SAMZA-1041 > <https://issues.apache.org/jira/browse/SAMZA-1041>). Multi-stage feature > and fluent-api for Samza introduces the notion of "application". Since the > processorId generator config applies to all jobs within a Samza > application, we decided to add the config for generator under "app" scope. > Further details on config scope changes can be found in SAMZA-1120. > <https://issues.apache.org/jira/browse/SAMZA-1120> > > I will send out an update once I change the SEP based on yesterday's > meeting and Jagadish's idea. > > Thanks! > Navina > > On Mon, Mar 20, 2017 at 5:22 PM, Jagadish Venkatraman < > jagadish1...@gmail.com> wrote: > >> Thanks for writing this SEP! >> >> Here's an alternate approach instead of taking the "String processorId" as >> a parameter in the constructor. In my view, the "processorId" could be >> generated by the StreamProcessor internally (instead of being generated >> up-stream and passed in). The Job Coordinator API could be as follows: >> >> >> public interface JobCoordinator { >> >> ProcessorIdGenerator getProcessorIDGenerator(); >> >> // could be String getProcessorID() >> >> JobModel getJobModel(); >> >> } >> >> public interface ProcessorIDGenerator { >> >> String getProcessorID(); >> } >> >> >> For instance, an Yarn job coordinator can merely parse the ID from config, >> and return it. A Zk backed implementation of the Job coordinator can agree >> on IDs using coordination leveraging Zk. One nice property with this >> approach is that it keeps all logic related to coordination, agreement on >> the Job Model, leader election (with potentially pluggable components for >> each) inside the JobCoordinator. >> >> To be clear, I'm all for pluggability for ID generation logic that this >> SEP >> advocates. I'm only wondering if this logic could instead reside inside >> the >> Job Coordinator (which is internal to the StreamProcessor) instead of >> relying on something external to it? >> >> Of course, there may be other considerations around the way the current >> code is structured that may prevent this. Let me know if you agree with >> this change. >> >> Thanks, >> Jag >> >> >> On Thu, Mar 16, 2017 at 5:21 PM, Navina Ramesh >> <nram...@linkedin.com.invalid >> > wrote: >> >> > > I am working on the ApplicationRunner SEP right now. Will send out the >> > discussion email once I am done. >> > >> > Perfect! :) >> > >> > On Thu, Mar 16, 2017 at 5:13 PM, xinyu liu <xinyuliu...@gmail.com> >> wrote: >> > >> > > Right, the static factory is very simple as you said. It's pretty >> > > convenient for the client to use. >> > > >> > > I am working on the ApplicationRunner SEP right now. Will send out the >> > > discussion email once I am done. >> > > >> > > Thanks, >> > > Xinyu >> > > >> > > On Thu, Mar 16, 2017 at 4:50 PM, Navina Ramesh (Apache) < >> > nav...@apache.org >> > > > >> > > wrote: >> > > >> > > > > One minor thing I found is that the name of the config is camel >> case >> > > > (*processor.idGenerator.class*). Seems Samza's practice is to use >> all >> > > > lower >> > > > case configs with "." delimiter. Do you think we should stick to >> this >> > > > convention? >> > > > >> > > > I am always torn between the "convention" we have and the better >> way of >> > > > doing things. But I don't have strong opinions about it. I can >> change >> > it. >> > > > >> > > > > One more suggestion is to have a static factory method in the >> > > > ProcessorIdGenerator (Like what we have in ApplicationRunner): >> > > > >> > > > I couldn't grasp these requirements from the ApplicationRunner >> design. >> > It >> > > > will be great if you can put it out in an SEP :) >> > > > >> > > > I can add the static factory method for it. Just to clarify, the >> static >> > > > method simply class loads the ProcessorIdGenerator ? It uses >> reflection >> > > to >> > > > create the instance ? >> > > > >> > > > Thanks! >> > > > Navina >> > > > >> > > > >> > > > >> > > > On Thu, Mar 16, 2017 at 4:31 PM, xinyu liu <xinyuliu...@gmail.com> >> > > wrote: >> > > > >> > > > > The proposal looks great to me! Changing the id type to string >> will >> > > make >> > > > > sure this can work with other types of cluster which doesn't >> support >> > > > > integer id. The interface and config provides a pluggable way to >> have >> > > > > different id generators for different use cases. One minor thing I >> > > found >> > > > is >> > > > > that the name of the config is camel case >> > > (*processor.idGenerator.class* >> > > > ). >> > > > > Seems Samza's practice is to use all lower case configs with "." >> > > > delimiter. >> > > > > Do you think we should stick to this convention? >> > > > > >> > > > > One more suggestion is to have a static factory method in >> > > > > the ProcessorIdGenerator (Like what we have in ApplicationRunner): >> > > > > >> > > > > static ProcessIdGenerator fromConfig(Config config) { ... }. >> > > > > >> > > > > With this, It will be more convenient for the ApplicationRunner to >> > > > > construct the generator. What do you think? >> > > > > >> > > > > Thanks, >> > > > > Xinyu >> > > > > >> > > > > On Wed, Mar 15, 2017 at 10:59 PM, Navina Ramesh (Apache) < >> > > > > nav...@apache.org> >> > > > > wrote: >> > > > > >> > > > > > Hi everyone, >> > > > > > I created a proposal for SAMZA-1126, which addresses the >> semantics >> > of >> > > > > > ProcessorId in Samza. For most purposes, ProcessorId is same as >> the >> > > > > logical >> > > > > > id that Samza assigns for each Yarn container. It is primarily >> used >> > > in >> > > > > > JobModel as a key for the corresponding ContainerModel and >> also, in >> > > > > > container-level metrics. We are expanding the applicability of >> > > > > processorId >> > > > > > to be beyond a fixed set of processors. >> > > > > > >> > > > > > Please review and comment on this SEP. >> > > > > > >> > > > > > For those who are not actively following the master branch, you >> may >> > > > have >> > > > > > more questions than others. Feel free to ask them here. >> > > > > > >> > > > > > @Xinyu: Since you are working on SAMZA-1067 and other related >> > > > integration >> > > > > > APIs, can you please add an SEP for SAMZA-1067 ? This will help >> > > others >> > > > > (adn >> > > > > > me as well) get on the same page with your design/code. Let me >> know >> > > if >> > > > > > SEP-1 will work per your design for ApplicationRunner. >> > > > > > >> > > > > > Thanks! >> > > > > > Navina >> > > > > > >> > > > > >> > > > >> > > >> > >> > >> > >> > -- >> > Navina R. >> > >> >> >> >> -- >> Jagadish V, >> Graduate Student, >> Department of Computer Science, >> Stanford University >> > >