Big +1 from my side, I think this will help the community grow and prosper big time!
On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax <mj...@apache.org> wrote: > +1 from my side. > > Happy to be the maintainer for Storm-Compatibiltiy (at least I guess > it's me, even the correct spelling would be with two 't' :P) > > -Matthias > > On 05/12/2016 12:56 PM, Till Rohrmann wrote: > > +1 for the proposal > > On May 12, 2016 12:13 PM, "Stephan Ewen" <se...@apache.org> wrote: > > > >> Yes, Gabor Gevay, that did refer to you! > >> > >> Sorry for the ambiguity... > >> > >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi < > balassi.mar...@gmail.com > >>> > >> wrote: > >> > >>> +1 for the proposal > >>> @ggevay: I do think that it refers to you. :) > >>> > >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <gga...@gmail.com> > wrote: > >>> > >>>> Hello, > >>>> > >>>> There are at least three Gábors in the Flink community, :) so > >>>> assuming that the Gábor in the list of maintainers of the DataSet API > >>>> is referring to me, I'll be happy to do it. :) > >>>> > >>>> Best, > >>>> Gábor G. > >>>> > >>>> > >>>> > >>>> 2016-05-10 11:24 GMT+02:00 Stephan Ewen <se...@apache.org>: > >>>>> Hi everyone! > >>>>> > >>>>> We propose to establish some lightweight structures in the Flink open > >>>>> source community and development process, > >>>>> to help us better handle the increased interest in Flink (mailing > >> list > >>>> and > >>>>> pull requests), while not overwhelming the > >>>>> committers, and giving users and contributors a good experience. > >>>>> > >>>>> This proposal is triggered by the observation that we are reaching > >> the > >>>>> limits of where the current community can support > >>>>> users and guide new contributors. The below proposal is based on > >>>>> observations and ideas from Till, Robert, and me. > >>>>> > >>>>> ======== > >>>>> Goals > >>>>> ======== > >>>>> > >>>>> We try to achieve the following > >>>>> > >>>>> - Pull requests get handled in a timely fashion > >>>>> - New contributors are better integrated into the community > >>>>> - The community feels empowered on the mailing list. > >>>>> But questions that need the attention of someone that has deep > >>>>> knowledge of a certain part of Flink get their attention. > >>>>> - At the same time, the committers that are knowledgeable about > >> many > >>>> core > >>>>> parts do not get completely overwhelmed. > >>>>> - We don't overlook threads that report critical issues. > >>>>> - We always have a pretty good overview of what the status of > >> certain > >>>>> parts of the system are. > >>>>> -> What are often encountered known issues > >>>>> -> What are the most frequently requested features > >>>>> > >>>>> > >>>>> ======== > >>>>> Problems > >>>>> ======== > >>>>> > >>>>> Looking into the process, there are two big issues: > >>>>> > >>>>> (1) Up to now, we have been relying on the fact that everything just > >>>>> "organizes itself", driven by best effort. That assumes > >>>>> that everyone feels equally responsible for every part, question, and > >>>>> contribution. At the current state, this is impossible > >>>>> to maintain, it overwhelms the committers and contributors. > >>>>> > >>>>> Example: Pull requests are picked up by whoever wants to pick them > >> up. > >>>> Pull > >>>>> requests that are a lot of work, have little > >>>>> chance of getting in, or relate to less active components are > >> sometimes > >>>> not > >>>>> picked up. When contributors are pretty > >>>>> loaded already, it may happen that no one eventually feels > >> responsible > >>> to > >>>>> pick up a pull request, and it falls through the cracks. > >>>>> > >>>>> (2) There is no good overview of what are known shortcomings, > >> efforts, > >>>> and > >>>>> requested features for different parts of the system. > >>>>> This information exists in various peoples' heads, but is not easily > >>>>> accessible for new people. The Flink JIRA is not well > >>>>> maintained, it is not easy to draw insights from that. > >>>>> > >>>>> > >>>>> =========== > >>>>> The Proposal > >>>>> =========== > >>>>> > >>>>> Since we are building a parallel system, the natural solution seems > >> to > >>>> be: > >>>>> partition the workload ;-) > >>>>> > >>>>> We propose to define a set of components for Flink. Each component is > >>>>> maintained or tracked by one or more > >>>>> people - let's call them maintainers. It is important to note that we > >>>> don't > >>>>> suggest the maintainers as an authoritative role, but > >>>>> simply as committers or contributors that visibly step up for a > >> certain > >>>>> component, and mainly track and drive the efforts > >>>>> pertaining to that component. > >>>>> > >>>>> It is also important to realize that we do not want to suggest that > >>>> people > >>>>> get less involved with certain parts and components, because > >>>>> they are not the maintainers. We simply want to make sure that each > >>> pull > >>>>> request or question or contribution has in the end > >>>>> one person (or a small set of people) responsible for catching and > >>>> tracking > >>>>> it, if it was not worked on by the pro-active > >>>>> community. > >>>>> > >>>>> For some components, having multiple maintainers will be helpful. In > >>> that > >>>>> case, one maintainer should be the "chair" or "lead" > >>>>> and make sure that no issue of that component gets lost between the > >>>>> multiple maintainers. > >>>>> > >>>>> > >>>>> A maintainers' role is: > >>>>> ----------------------------- > >>>>> > >>>>> - Have an overview of which of the open pull requests relate to > >> their > >>>>> component > >>>>> - Drive the pull requests relating to the component to resolution > >>>>> => Moderate the decision whether the feature should be merged > >>>>> => Make sure the pull request gets a shepherd. > >>>>> In many cases, the maintainers would shepherd themselves. > >>>>> => In case the shepherd becomes inactive, the maintainers need > >> to > >>>>> find a new shepherd. > >>>>> > >>>>> - Have an overview of what are the known issues of their component > >>>>> - Have an overview of what are the frequently requested features of > >>>> their > >>>>> component > >>>>> > >>>>> - Have an overview of which contributors are doing very good work > >> in > >>>>> their component, > >>>>> would be candidates for committers, and should be mentored > >> towards > >>>> that. > >>>>> > >>>>> - Resolve email threads that have been brought to their attention, > >>>>> because deeper > >>>>> component knowledge is required for that thread. > >>>>> > >>>>> A maintainers' role is NOT: > >>>>> ---------------------------------- > >>>>> > >>>>> - Review all pull requests of that component > >>>>> - Answer every mail with questions about that component > >>>>> - Fix all bugs and implement all features of that components > >>>>> > >>>>> > >>>>> We imagine the following way that the community and the maintainers > >>>>> interact: > >>>>> > >>>> > >>> > >> > --------------------------------------------------------------------------------------------------------- > >>>>> > >>>>> - Pull requests should be tagged by component. Since we cannot add > >>>> labels > >>>>> at this point, we need > >>>>> to rely on the following: > >>>>> => The pull request opener should name the pull request like > >>>>> "[FLINK-XXX] [component] Title" > >>>>> => Components can be (re) tagged by adding special comments in > >> the > >>>>> pull request ("==> component client") > >>>>> => With some luck, GitHub and Apache Infra will allow us to use > >>>> labels > >>>>> at some point > >>>>> > >>>>> - When pull requests are associated with a component, the > >> maintainers > >>>>> will manage them > >>>>> (decision whether to add, find shepherd, catch dropped pull > >>> requests) > >>>>> > >>>>> - We assume that maintainers frequently reach out to other > >> community > >>>>> members and ask them if they want > >>>>> to shepherd a pull request. > >>>>> > >>>>> - On the mailing list, everyone should feel equally empowered to > >>> answer > >>>>> and discuss. > >>>>> If at some point in the discussion, some deep technical knowledge > >>>> about > >>>>> a component is required, > >>>>> the maintainer(s) should be drawn into the discussion. > >>>>> Because the Mailing List infrastructure has no support to tag > >>>> threads, > >>>>> here are some simple workarounds: > >>>>> > >>>>> => One possibility is to put the maintainers' mail addresses on > >> cc > >>>> for > >>>>> the thread, so they get the mail > >>>>> not just via l the mailing list > >>>>> => Another way would be to post something like "+maintainer > >>> runtime" > >>>> in > >>>>> the thread and the "runtime" > >>>>> maintainers would have a filter/alert on these keywords in > >>> their > >>>>> mail program. > >>>>> > >>>>> - We assume that maintainers will reach out to community members > >> that > >>>> are > >>>>> very active and helpful in > >>>>> a component, and will ask them if they want to be added as > >>>> maintainers. > >>>>> That will make it visible that those people are experts for that > >>> part > >>>>> of Flink. > >>>>> > >>>>> > >>>>> ====================================== > >>>>> Maintainers: Committers and Contributors > >>>>> ====================================== > >>>>> > >>>>> It helps if maintainers are committers (since we want them to resolve > >>>> pull > >>>>> requests which often involves > >>>>> merging them). > >>>>> > >>>>> Components with multiple maintainers can easily have non-committer > >>>>> contributors in addition to committer > >>>>> contributors. > >>>>> > >>>>> > >>>>> ====== > >>>>> JIRA > >>>>> ====== > >>>>> > >>>>> Ideally, JIRA can be used to get an overview of what are the known > >>> issues > >>>>> of each component, and what are > >>>>> common feature requests. Unfortunately, the Flink JIRA is quite > >>>> unorganized > >>>>> right now. > >>>>> > >>>>> A natural followup effort of this proposal would be to define in JIRA > >>> the > >>>>> same components as we defined here, > >>>>> and have the maintainers keep JIRA meaningful for that particular > >>>>> component. That would allow us to > >>>>> easily generate some tables out of JIRA (like top known issues per > >>>>> component, most requested features) > >>>>> post them on the dev list once in a while as a "state of the union" > >>>> report. > >>>>> > >>>>> Initial assignment of issues to components should be made by those > >>> people > >>>>> opening the issue. The maintainer > >>>>> of that tagged component needs to change the tag, if the component > >> was > >>>>> classified incorrectly. > >>>>> > >>>>> > >>>>> ====================================== > >>>>> Initial Components and Maintainers Suggestion > >>>>> ====================================== > >>>>> > >>>>> Below is a suggestion of how to define components for Flink. One goal > >>> of > >>>>> the division was to make it > >>>>> obvious for the majority of questions and contributions to which > >>>> component > >>>>> they would relate. Otherwise, > >>>>> if many contributions had fuzzy component associations, we would > >> again > >>>> not > >>>>> solve the issue of having clear > >>>>> responsibilities for who would track the progress and resolution. > >>>>> > >>>>> We also looked at each component and wrote the names of some people > >> who > >>>> we > >>>>> thought were natural > >>>>> experts for the components, and thus natural candidates for > >>> maintainers. > >>>>> > >>>>> **These names are only a starting point for discussion.** > >>>>> > >>>>> Once agreed upon, the components and names of maintainers should be > >>> kept > >>>> in > >>>>> the wiki and updated as > >>>>> components change and people step up or down. > >>>>> > >>>>> > >>>>> *DataSet API* (*Fabian, Greg, Gabor*) > >>>>> - Incuding Hadoop compat. parts > >>>>> > >>>>> *DataStream API* (*Aljoscha, Max, Stephan*) > >>>>> > >>>>> *Runtime* > >>>>> - Distributed Coordination (JobManager/TaskManager, Akka) (*Till*) > >>>>> - Local Runtime (Memory Management, State Backends, > >> Tasks/Operators) > >>> ( > >>>>> *Stephan*) > >>>>> - Network (*Ufuk*) > >>>>> > >>>>> *Client/Optimizer* (*Fabian*) > >>>>> > >>>>> *Type system / Type extractor* (Timo) > >>>>> > >>>>> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*) > >>>>> > >>>>> *Libraries* > >>>>> - Gelly (*Vasia, Greg*) > >>>>> - ML (*Till, Theo*) > >>>>> - CEP (*Till*) > >>>>> - Python (*Chesnay*) > >>>>> > >>>>> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*) > >>>>> > >>>>> *Streaming Connectors* (*Robert*, *Aljoscha*) > >>>>> > >>>>> *Batch Connectors and Input/Output Formats* (*Chesnay*) > >>>>> > >>>>> *Storm Compatibility Layer* (*Mathias*) > >>>>> > >>>>> *Scala shell* (*Till*) > >>>>> > >>>>> *Startup Shell Scripts* (Ufuk) > >>>>> > >>>>> *Flink Build System, Maven Files* (*Robert*) > >>>>> > >>>>> *Documentation* (Ufuk) > >>>>> > >>>>> > >>>>> Please let us know what you think about this proposal. > >>>>> Happy discussing! > >>>>> > >>>>> Greetings, > >>>>> Stephan > >>>> > >>> > >> > > > >