Overseer? Supervisor? Warden?
2016-05-31 21:23 GMT+02:00 Robert Metzger <rmetz...@apache.org>: > Good point. I haven't thought about this name clash. > However, I wonder whether it is clear from the context whether we are > talking about pull request and component shepherding. > > Are there any other ideas for the name? If nobody else has concerns > regarding the "maintainer" name, we can of course keep it. > > On Tue, May 31, 2016 at 7:57 PM, Chesnay Schepler <ches...@apache.org> > wrote: > > > so are we discarding the other "shepherd" role then? > > > > > > On 31.05.2016 19:47, Robert Metzger wrote: > > > >> Hi, > >> > >> to keep this discussion going, I pasted Stephan's Component proposal > into > >> the Wiki: > >> > https://cwiki.apache.org/confluence/display/FLINK/Components+and+Shepherds > >> > >> Also, I suggest to rename the "maintainer" to "shepherd" to reflect that > >> still the committers and the PMC is in charge and the shepherd is only > >> keeping a closer eye on some of the compontents (basically reflecting > the > >> structure we have already in the community a bit more officially) > >> > >> Lets discuss the proposed shepherds for the components based on > Stephan's > >> proposals. > >> > >> Please edit in the wiki or write here if you want to add or remove > >> yourself > >> for a component. > >> If somebody, who has been proposed as a shepherd didn't react until end > of > >> this week, I'll remove them (for now. I just want to ensure that we > don't > >> make somebody a shepherd who isn't aware). > >> > >> Regards, > >> Robert > >> > >> > >> On Tue, May 17, 2016 at 2:10 PM, Stephan Ewen <se...@apache.org> wrote: > >> > >> Hi! > >>> > >>> Thanks for all the comments, and the positive resonance! Looks like so > >>> far > >>> all are in favor. > >>> > >>> I would next add a section to the Wiki and the "How to Contribute" > Guide > >>> on > >>> this structure, incorporating the component split of Optimizer and > >>> Client. > >>> > >>> After that, let's get started with gathering candidates for the > >>> maintainer > >>> roles. The ones suggested in the mail would be a starting point. > >>> > >>> Greetings, > >>> Stephan > >>> > >>> > >>> On Mon, May 16, 2016 at 11:48 AM, Kostas Tzoumas <ktzou...@apache.org> > >>> wrote: > >>> > >>> +1 to Henry's comment, once this makes it to the wiki/website the > wording > >>>> needs to make it clear that the governance model is unchanged > >>>> > >>>> On Mon, May 16, 2016 at 10:02 AM, Theodore Vasiloudis < > >>>> theodoros.vasilou...@gmail.com> wrote: > >>>> > >>>> I like the idea of having maintainers as well, hopefully we can > >>>>> > >>>> streamline > >>>> > >>>>> the reviewing process. > >>>>> > >>>>> I of course can volunteer for the FlinkML component. > >>>>> As I've mentioned before I'd love to get one more committer willing > to > >>>>> review PRs in FlinkML; by my last count we were up to ~20 open > >>>>> > >>>> ML-related > >>> > >>>> PRs. > >>>>> > >>>>> Regards, > >>>>> Theodore > >>>>> > >>>>> On Mon, May 16, 2016 at 2:17 AM, Henry Saputra < > >>>>> > >>>> henry.sapu...@gmail.com> > >>> > >>>> wrote: > >>>>> > >>>>> The maintainers concept is good idea to make sure PRs are moved > >>>>>> > >>>>> smoothly. > >>>> > >>>>> But, we need to make sure that this is not additional hierarchy on > >>>>>> > >>>>> top > >>> > >>>> of > >>>> > >>>>> Flink PMCs. > >>>>>> This will keep us in spirit of ASF community over code. > >>>>>> > >>>>>> Please do add me as cluster management maintainer member. > >>>>>> > >>>>>> - Henry > >>>>>> > >>>>>> On Tuesday, May 10, 2016, Stephan Ewen <se...@apache.org> wrote: > >>>>>> > >>>>>> Hi everyone! > >>>>>>> > >>>>>>> We propose to establish some lightweight structures in the Flink > >>>>>>> > >>>>>> open > >>> > >>>> source community and development process, > >>>>>>> to help us better handle the increased interest in Flink (mailing > >>>>>>> > >>>>>> list > >>>> > >>>>> and > >>>>>> > >>>>>>> pull requests), while not overwhelming the > >>>>>>> committers, and giving users and contributors a good experience. > >>>>>>> > >>>>>>> This proposal is triggered by the observation that we are reaching > >>>>>>> > >>>>>> the > >>>> > >>>>> limits of where the current community can support > >>>>>>> users and guide new contributors. The below proposal is based on > >>>>>>> observations and ideas from Till, Robert, and me. > >>>>>>> > >>>>>>> ======== > >>>>>>> Goals > >>>>>>> ======== > >>>>>>> > >>>>>>> We try to achieve the following > >>>>>>> > >>>>>>> - Pull requests get handled in a timely fashion > >>>>>>> - New contributors are better integrated into the community > >>>>>>> - The community feels empowered on the mailing list. > >>>>>>> But questions that need the attention of someone that has deep > >>>>>>> knowledge of a certain part of Flink get their attention. > >>>>>>> - At the same time, the committers that are knowledgeable about > >>>>>>> > >>>>>> many > >>>> > >>>>> core > >>>>>> > >>>>>>> parts do not get completely overwhelmed. > >>>>>>> - We don't overlook threads that report critical issues. > >>>>>>> - We always have a pretty good overview of what the status of > >>>>>>> > >>>>>> certain > >>>> > >>>>> parts of the system are. > >>>>>>> -> What are often encountered known issues > >>>>>>> -> What are the most frequently requested features > >>>>>>> > >>>>>>> > >>>>>>> ======== > >>>>>>> Problems > >>>>>>> ======== > >>>>>>> > >>>>>>> Looking into the process, there are two big issues: > >>>>>>> > >>>>>>> (1) Up to now, we have been relying on the fact that everything > >>>>>>> > >>>>>> just > >>> > >>>> "organizes itself", driven by best effort. That assumes > >>>>>>> that everyone feels equally responsible for every part, question, > >>>>>>> > >>>>>> and > >>> > >>>> contribution. At the current state, this is impossible > >>>>>>> to maintain, it overwhelms the committers and contributors. > >>>>>>> > >>>>>>> Example: Pull requests are picked up by whoever wants to pick them > >>>>>>> > >>>>>> up. > >>>> > >>>>> Pull > >>>>>> > >>>>>>> requests that are a lot of work, have little > >>>>>>> chance of getting in, or relate to less active components are > >>>>>>> > >>>>>> sometimes > >>>> > >>>>> not > >>>>>> > >>>>>>> picked up. When contributors are pretty > >>>>>>> loaded already, it may happen that no one eventually feels > >>>>>>> > >>>>>> responsible > >>>> > >>>>> to > >>>>> > >>>>>> pick up a pull request, and it falls through the cracks. > >>>>>>> > >>>>>>> (2) There is no good overview of what are known shortcomings, > >>>>>>> > >>>>>> efforts, > >>>> > >>>>> and > >>>>>> > >>>>>>> requested features for different parts of the system. > >>>>>>> This information exists in various peoples' heads, but is not > >>>>>>> > >>>>>> easily > >>> > >>>> accessible for new people. The Flink JIRA is not well > >>>>>>> maintained, it is not easy to draw insights from that. > >>>>>>> > >>>>>>> > >>>>>>> =========== > >>>>>>> The Proposal > >>>>>>> =========== > >>>>>>> > >>>>>>> Since we are building a parallel system, the natural solution seems > >>>>>>> > >>>>>> to > >>>> > >>>>> be: > >>>>>> > >>>>>>> partition the workload ;-) > >>>>>>> > >>>>>>> We propose to define a set of components for Flink. Each component > >>>>>>> > >>>>>> is > >>> > >>>> maintained or tracked by one or more > >>>>>>> people - let's call them maintainers. It is important to note that > >>>>>>> > >>>>>> we > >>> > >>>> don't > >>>>>> > >>>>>>> suggest the maintainers as an authoritative role, but > >>>>>>> simply as committers or contributors that visibly step up for a > >>>>>>> > >>>>>> certain > >>>> > >>>>> component, and mainly track and drive the efforts > >>>>>>> pertaining to that component. > >>>>>>> > >>>>>>> It is also important to realize that we do not want to suggest that > >>>>>>> > >>>>>> people > >>>>>> > >>>>>>> get less involved with certain parts and components, because > >>>>>>> they are not the maintainers. We simply want to make sure that each > >>>>>>> > >>>>>> pull > >>>>> > >>>>>> request or question or contribution has in the end > >>>>>>> one person (or a small set of people) responsible for catching and > >>>>>>> > >>>>>> tracking > >>>>>> > >>>>>>> it, if it was not worked on by the pro-active > >>>>>>> community. > >>>>>>> > >>>>>>> For some components, having multiple maintainers will be helpful. > >>>>>>> > >>>>>> In > >>> > >>>> that > >>>>> > >>>>>> case, one maintainer should be the "chair" or "lead" > >>>>>>> and make sure that no issue of that component gets lost between the > >>>>>>> multiple maintainers. > >>>>>>> > >>>>>>> > >>>>>>> A maintainers' role is: > >>>>>>> ----------------------------- > >>>>>>> > >>>>>>> - Have an overview of which of the open pull requests relate to > >>>>>>> > >>>>>> their > >>>> > >>>>> component > >>>>>>> - Drive the pull requests relating to the component to > resolution > >>>>>>> => Moderate the decision whether the feature should be > merged > >>>>>>> => Make sure the pull request gets a shepherd. > >>>>>>> In many cases, the maintainers would shepherd > >>>>>>> > >>>>>> themselves. > >>> > >>>> => In case the shepherd becomes inactive, the maintainers > >>>>>>> > >>>>>> need > >>> > >>>> to > >>>> > >>>>> find a new shepherd. > >>>>>>> > >>>>>>> - Have an overview of what are the known issues of their > >>>>>>> > >>>>>> component > >>> > >>>> - Have an overview of what are the frequently requested features > >>>>>>> > >>>>>> of > >>> > >>>> their > >>>>>> > >>>>>>> component > >>>>>>> > >>>>>>> - Have an overview of which contributors are doing very good > work > >>>>>>> > >>>>>> in > >>>> > >>>>> their component, > >>>>>>> would be candidates for committers, and should be mentored > >>>>>>> > >>>>>> towards > >>>> > >>>>> that. > >>>>>>> > >>>>>>> - Resolve email threads that have been brought to their > >>>>>>> > >>>>>> attention, > >>> > >>>> because deeper > >>>>>>> component knowledge is required for that thread. > >>>>>>> > >>>>>>> A maintainers' role is NOT: > >>>>>>> ---------------------------------- > >>>>>>> > >>>>>>> - Review all pull requests of that component > >>>>>>> - Answer every mail with questions about that component > >>>>>>> - Fix all bugs and implement all features of that components > >>>>>>> > >>>>>>> > >>>>>>> We imagine the following way that the community and the maintainers > >>>>>>> interact: > >>>>>>> > >>>>>>> > >>>>>>> > >>> > --------------------------------------------------------------------------------------------------------- > >>> > >>>> - Pull requests should be tagged by component. Since we cannot > >>>>>>> > >>>>>> add > >>> > >>>> labels > >>>>>> > >>>>>>> at this point, we need > >>>>>>> to rely on the following: > >>>>>>> => The pull request opener should name the pull request like > >>>>>>> "[FLINK-XXX] [component] Title" > >>>>>>> => Components can be (re) tagged by adding special comments > in > >>>>>>> > >>>>>> the > >>>> > >>>>> pull request ("==> component client") > >>>>>>> => With some luck, GitHub and Apache Infra will allow us to > >>>>>>> > >>>>>> use > >>> > >>>> labels > >>>>>> > >>>>>>> at some point > >>>>>>> > >>>>>>> - When pull requests are associated with a component, the > >>>>>>> > >>>>>> maintainers > >>>> > >>>>> will manage them > >>>>>>> (decision whether to add, find shepherd, catch dropped pull > >>>>>>> > >>>>>> requests) > >>>>> > >>>>>> - We assume that maintainers frequently reach out to other > >>>>>>> > >>>>>> community > >>>> > >>>>> members and ask them if they want > >>>>>>> to shepherd a pull request. > >>>>>>> > >>>>>>> - On the mailing list, everyone should feel equally empowered to > >>>>>>> > >>>>>> answer > >>>>> > >>>>>> and discuss. > >>>>>>> If at some point in the discussion, some deep technical > >>>>>>> > >>>>>> knowledge > >>> > >>>> about > >>>>>> > >>>>>>> a component is required, > >>>>>>> the maintainer(s) should be drawn into the discussion. > >>>>>>> Because the Mailing List infrastructure has no support to tag > >>>>>>> > >>>>>> threads, > >>>>>> > >>>>>>> here are some simple workarounds: > >>>>>>> > >>>>>>> => One possibility is to put the maintainers' mail addresses > on > >>>>>>> > >>>>>> cc > >>>> > >>>>> for > >>>>>> > >>>>>>> the thread, so they get the mail > >>>>>>> not just via l the mailing list > >>>>>>> => Another way would be to post something like "+maintainer > >>>>>>> > >>>>>> runtime" > >>>>> > >>>>>> in > >>>>>> > >>>>>>> the thread and the "runtime" > >>>>>>> maintainers would have a filter/alert on these keywords > in > >>>>>>> > >>>>>> their > >>>>> > >>>>>> mail program. > >>>>>>> > >>>>>>> - We assume that maintainers will reach out to community members > >>>>>>> > >>>>>> that > >>>> > >>>>> are > >>>>>> > >>>>>>> very active and helpful in > >>>>>>> a component, and will ask them if they want to be added as > >>>>>>> > >>>>>> maintainers. > >>>>>> > >>>>>>> That will make it visible that those people are experts for > >>>>>>> > >>>>>> that > >>> > >>>> part > >>>>> > >>>>>> of Flink. > >>>>>>> > >>>>>>> > >>>>>>> ====================================== > >>>>>>> Maintainers: Committers and Contributors > >>>>>>> ====================================== > >>>>>>> > >>>>>>> It helps if maintainers are committers (since we want them to > >>>>>>> > >>>>>> resolve > >>> > >>>> pull > >>>>>> > >>>>>>> requests which often involves > >>>>>>> merging them). > >>>>>>> > >>>>>>> Components with multiple maintainers can easily have non-committer > >>>>>>> contributors in addition to committer > >>>>>>> contributors. > >>>>>>> > >>>>>>> > >>>>>>> ====== > >>>>>>> JIRA > >>>>>>> ====== > >>>>>>> > >>>>>>> Ideally, JIRA can be used to get an overview of what are the known > >>>>>>> > >>>>>> issues > >>>>> > >>>>>> of each component, and what are > >>>>>>> common feature requests. Unfortunately, the Flink JIRA is quite > >>>>>>> > >>>>>> unorganized > >>>>>> > >>>>>>> right now. > >>>>>>> > >>>>>>> A natural followup effort of this proposal would be to define in > >>>>>>> > >>>>>> JIRA > >>> > >>>> the > >>>>> > >>>>>> same components as we defined here, > >>>>>>> and have the maintainers keep JIRA meaningful for that particular > >>>>>>> component. That would allow us to > >>>>>>> easily generate some tables out of JIRA (like top known issues per > >>>>>>> component, most requested features) > >>>>>>> post them on the dev list once in a while as a "state of the union" > >>>>>>> > >>>>>> report. > >>>>>> > >>>>>>> Initial assignment of issues to components should be made by those > >>>>>>> > >>>>>> people > >>>>> > >>>>>> opening the issue. The maintainer > >>>>>>> of that tagged component needs to change the tag, if the component > >>>>>>> > >>>>>> was > >>>> > >>>>> classified incorrectly. > >>>>>>> > >>>>>>> > >>>>>>> ====================================== > >>>>>>> Initial Components and Maintainers Suggestion > >>>>>>> ====================================== > >>>>>>> > >>>>>>> Below is a suggestion of how to define components for Flink. One > >>>>>>> > >>>>>> goal > >>> > >>>> of > >>>>> > >>>>>> the division was to make it > >>>>>>> obvious for the majority of questions and contributions to which > >>>>>>> > >>>>>> component > >>>>>> > >>>>>>> they would relate. Otherwise, > >>>>>>> if many contributions had fuzzy component associations, we would > >>>>>>> > >>>>>> again > >>>> > >>>>> not > >>>>>> > >>>>>>> solve the issue of having clear > >>>>>>> responsibilities for who would track the progress and resolution. > >>>>>>> > >>>>>>> We also looked at each component and wrote the names of some people > >>>>>>> > >>>>>> who > >>>> > >>>>> we > >>>>>> > >>>>>>> thought were natural > >>>>>>> experts for the components, and thus natural candidates for > >>>>>>> > >>>>>> maintainers. > >>>>> > >>>>>> **These names are only a starting point for discussion.** > >>>>>>> > >>>>>>> Once agreed upon, the components and names of maintainers should be > >>>>>>> > >>>>>> kept > >>>>> > >>>>>> in > >>>>>> > >>>>>>> the wiki and updated as > >>>>>>> components change and people step up or down. > >>>>>>> > >>>>>>> > >>>>>>> *DataSet API* (*Fabian, Greg, Gabor*) > >>>>>>> - Incuding Hadoop compat. parts > >>>>>>> > >>>>>>> *DataStream API* (*Aljoscha, Max, Stephan*) > >>>>>>> > >>>>>>> *Runtime* > >>>>>>> - Distributed Coordination (JobManager/TaskManager, Akka) > >>>>>>> > >>>>>> (*Till*) > >>> > >>>> - Local Runtime (Memory Management, State Backends, > >>>>>>> > >>>>>> Tasks/Operators) > >>>> > >>>>> ( > >>>>> > >>>>>> *Stephan*) > >>>>>>> - Network (*Ufuk*) > >>>>>>> > >>>>>>> *Client/Optimizer* (*Fabian*) > >>>>>>> > >>>>>>> *Type system / Type extractor* (Timo) > >>>>>>> > >>>>>>> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*) > >>>>>>> > >>>>>>> *Libraries* > >>>>>>> - Gelly (*Vasia, Greg*) > >>>>>>> - ML (*Till, Theo*) > >>>>>>> - CEP (*Till*) > >>>>>>> - Python (*Chesnay*) > >>>>>>> > >>>>>>> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*) > >>>>>>> > >>>>>>> *Streaming Connectors* (*Robert*, *Aljoscha*) > >>>>>>> > >>>>>>> *Batch Connectors and Input/Output Formats* (*Chesnay*) > >>>>>>> > >>>>>>> *Storm Compatibility Layer* (*Mathias*) > >>>>>>> > >>>>>>> *Scala shell* (*Till*) > >>>>>>> > >>>>>>> *Startup Shell Scripts* (Ufuk) > >>>>>>> > >>>>>>> *Flink Build System, Maven Files* (*Robert*) > >>>>>>> > >>>>>>> *Documentation* (Ufuk) > >>>>>>> > >>>>>>> > >>>>>>> Please let us know what you think about this proposal. > >>>>>>> Happy discussing! > >>>>>>> > >>>>>>> Greetings, > >>>>>>> Stephan > >>>>>>> > >>>>>>> > > >