Hey Stephan! Thanks to you and the others who started this. I really like the proposal and I'm happy to see my name on some components.
So, +1. I'd say let's wait until the end of the week/beginning of next week to see if there is any disagreement with the propsal in the community (doesn't look like it so far ;-)). Then we can continue to execute this. :-) – Ufuk On Thu, May 12, 2016 at 1:52 PM, Stephan Ewen <se...@apache.org> wrote: > Yes, Matthias, that was supposed to be you. > Sorry from another guy who frequently has his name misspelled ;-) > > On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax <mj...@apache.org> wrote: > >> +1 from my side. >> >> Happy to be the maintainer for Storm-Compatibiltiy (at least I guess >> it's me, even the correct spelling would be with two 't' :P) >> >> -Matthias >> >> On 05/12/2016 12:56 PM, Till Rohrmann wrote: >> > +1 for the proposal >> > On May 12, 2016 12:13 PM, "Stephan Ewen" <se...@apache.org> wrote: >> > >> >> Yes, Gabor Gevay, that did refer to you! >> >> >> >> Sorry for the ambiguity... >> >> >> >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi < >> balassi.mar...@gmail.com >> >>> >> >> wrote: >> >> >> >>> +1 for the proposal >> >>> @ggevay: I do think that it refers to you. :) >> >>> >> >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <gga...@gmail.com> >> wrote: >> >>> >> >>>> Hello, >> >>>> >> >>>> There are at least three Gábors in the Flink community, :) so >> >>>> assuming that the Gábor in the list of maintainers of the DataSet API >> >>>> is referring to me, I'll be happy to do it. :) >> >>>> >> >>>> Best, >> >>>> Gábor G. >> >>>> >> >>>> >> >>>> >> >>>> 2016-05-10 11:24 GMT+02:00 Stephan Ewen <se...@apache.org>: >> >>>>> Hi everyone! >> >>>>> >> >>>>> We propose to establish some lightweight structures in the Flink open >> >>>>> source community and development process, >> >>>>> to help us better handle the increased interest in Flink (mailing >> >> list >> >>>> and >> >>>>> pull requests), while not overwhelming the >> >>>>> committers, and giving users and contributors a good experience. >> >>>>> >> >>>>> This proposal is triggered by the observation that we are reaching >> >> the >> >>>>> limits of where the current community can support >> >>>>> users and guide new contributors. The below proposal is based on >> >>>>> observations and ideas from Till, Robert, and me. >> >>>>> >> >>>>> ======== >> >>>>> Goals >> >>>>> ======== >> >>>>> >> >>>>> We try to achieve the following >> >>>>> >> >>>>> - Pull requests get handled in a timely fashion >> >>>>> - New contributors are better integrated into the community >> >>>>> - The community feels empowered on the mailing list. >> >>>>> But questions that need the attention of someone that has deep >> >>>>> knowledge of a certain part of Flink get their attention. >> >>>>> - At the same time, the committers that are knowledgeable about >> >> many >> >>>> core >> >>>>> parts do not get completely overwhelmed. >> >>>>> - We don't overlook threads that report critical issues. >> >>>>> - We always have a pretty good overview of what the status of >> >> certain >> >>>>> parts of the system are. >> >>>>> -> What are often encountered known issues >> >>>>> -> What are the most frequently requested features >> >>>>> >> >>>>> >> >>>>> ======== >> >>>>> Problems >> >>>>> ======== >> >>>>> >> >>>>> Looking into the process, there are two big issues: >> >>>>> >> >>>>> (1) Up to now, we have been relying on the fact that everything just >> >>>>> "organizes itself", driven by best effort. That assumes >> >>>>> that everyone feels equally responsible for every part, question, and >> >>>>> contribution. At the current state, this is impossible >> >>>>> to maintain, it overwhelms the committers and contributors. >> >>>>> >> >>>>> Example: Pull requests are picked up by whoever wants to pick them >> >> up. >> >>>> Pull >> >>>>> requests that are a lot of work, have little >> >>>>> chance of getting in, or relate to less active components are >> >> sometimes >> >>>> not >> >>>>> picked up. When contributors are pretty >> >>>>> loaded already, it may happen that no one eventually feels >> >> responsible >> >>> to >> >>>>> pick up a pull request, and it falls through the cracks. >> >>>>> >> >>>>> (2) There is no good overview of what are known shortcomings, >> >> efforts, >> >>>> and >> >>>>> requested features for different parts of the system. >> >>>>> This information exists in various peoples' heads, but is not easily >> >>>>> accessible for new people. The Flink JIRA is not well >> >>>>> maintained, it is not easy to draw insights from that. >> >>>>> >> >>>>> >> >>>>> =========== >> >>>>> The Proposal >> >>>>> =========== >> >>>>> >> >>>>> Since we are building a parallel system, the natural solution seems >> >> to >> >>>> be: >> >>>>> partition the workload ;-) >> >>>>> >> >>>>> We propose to define a set of components for Flink. Each component is >> >>>>> maintained or tracked by one or more >> >>>>> people - let's call them maintainers. It is important to note that we >> >>>> don't >> >>>>> suggest the maintainers as an authoritative role, but >> >>>>> simply as committers or contributors that visibly step up for a >> >> certain >> >>>>> component, and mainly track and drive the efforts >> >>>>> pertaining to that component. >> >>>>> >> >>>>> It is also important to realize that we do not want to suggest that >> >>>> people >> >>>>> get less involved with certain parts and components, because >> >>>>> they are not the maintainers. We simply want to make sure that each >> >>> pull >> >>>>> request or question or contribution has in the end >> >>>>> one person (or a small set of people) responsible for catching and >> >>>> tracking >> >>>>> it, if it was not worked on by the pro-active >> >>>>> community. >> >>>>> >> >>>>> For some components, having multiple maintainers will be helpful. In >> >>> that >> >>>>> case, one maintainer should be the "chair" or "lead" >> >>>>> and make sure that no issue of that component gets lost between the >> >>>>> multiple maintainers. >> >>>>> >> >>>>> >> >>>>> A maintainers' role is: >> >>>>> ----------------------------- >> >>>>> >> >>>>> - Have an overview of which of the open pull requests relate to >> >> their >> >>>>> component >> >>>>> - Drive the pull requests relating to the component to resolution >> >>>>> => Moderate the decision whether the feature should be merged >> >>>>> => Make sure the pull request gets a shepherd. >> >>>>> In many cases, the maintainers would shepherd themselves. >> >>>>> => In case the shepherd becomes inactive, the maintainers need >> >> to >> >>>>> find a new shepherd. >> >>>>> >> >>>>> - Have an overview of what are the known issues of their component >> >>>>> - Have an overview of what are the frequently requested features of >> >>>> their >> >>>>> component >> >>>>> >> >>>>> - Have an overview of which contributors are doing very good work >> >> in >> >>>>> their component, >> >>>>> would be candidates for committers, and should be mentored >> >> towards >> >>>> that. >> >>>>> >> >>>>> - Resolve email threads that have been brought to their attention, >> >>>>> because deeper >> >>>>> component knowledge is required for that thread. >> >>>>> >> >>>>> A maintainers' role is NOT: >> >>>>> ---------------------------------- >> >>>>> >> >>>>> - Review all pull requests of that component >> >>>>> - Answer every mail with questions about that component >> >>>>> - Fix all bugs and implement all features of that components >> >>>>> >> >>>>> >> >>>>> We imagine the following way that the community and the maintainers >> >>>>> interact: >> >>>>> >> >>>> >> >>> >> >> >> --------------------------------------------------------------------------------------------------------- >> >>>>> >> >>>>> - Pull requests should be tagged by component. Since we cannot add >> >>>> labels >> >>>>> at this point, we need >> >>>>> to rely on the following: >> >>>>> => The pull request opener should name the pull request like >> >>>>> "[FLINK-XXX] [component] Title" >> >>>>> => Components can be (re) tagged by adding special comments in >> >> the >> >>>>> pull request ("==> component client") >> >>>>> => With some luck, GitHub and Apache Infra will allow us to use >> >>>> labels >> >>>>> at some point >> >>>>> >> >>>>> - When pull requests are associated with a component, the >> >> maintainers >> >>>>> will manage them >> >>>>> (decision whether to add, find shepherd, catch dropped pull >> >>> requests) >> >>>>> >> >>>>> - We assume that maintainers frequently reach out to other >> >> community >> >>>>> members and ask them if they want >> >>>>> to shepherd a pull request. >> >>>>> >> >>>>> - On the mailing list, everyone should feel equally empowered to >> >>> answer >> >>>>> and discuss. >> >>>>> If at some point in the discussion, some deep technical knowledge >> >>>> about >> >>>>> a component is required, >> >>>>> the maintainer(s) should be drawn into the discussion. >> >>>>> Because the Mailing List infrastructure has no support to tag >> >>>> threads, >> >>>>> here are some simple workarounds: >> >>>>> >> >>>>> => One possibility is to put the maintainers' mail addresses on >> >> cc >> >>>> for >> >>>>> the thread, so they get the mail >> >>>>> not just via l the mailing list >> >>>>> => Another way would be to post something like "+maintainer >> >>> runtime" >> >>>> in >> >>>>> the thread and the "runtime" >> >>>>> maintainers would have a filter/alert on these keywords in >> >>> their >> >>>>> mail program. >> >>>>> >> >>>>> - We assume that maintainers will reach out to community members >> >> that >> >>>> are >> >>>>> very active and helpful in >> >>>>> a component, and will ask them if they want to be added as >> >>>> maintainers. >> >>>>> That will make it visible that those people are experts for that >> >>> part >> >>>>> of Flink. >> >>>>> >> >>>>> >> >>>>> ====================================== >> >>>>> Maintainers: Committers and Contributors >> >>>>> ====================================== >> >>>>> >> >>>>> It helps if maintainers are committers (since we want them to resolve >> >>>> pull >> >>>>> requests which often involves >> >>>>> merging them). >> >>>>> >> >>>>> Components with multiple maintainers can easily have non-committer >> >>>>> contributors in addition to committer >> >>>>> contributors. >> >>>>> >> >>>>> >> >>>>> ====== >> >>>>> JIRA >> >>>>> ====== >> >>>>> >> >>>>> Ideally, JIRA can be used to get an overview of what are the known >> >>> issues >> >>>>> of each component, and what are >> >>>>> common feature requests. Unfortunately, the Flink JIRA is quite >> >>>> unorganized >> >>>>> right now. >> >>>>> >> >>>>> A natural followup effort of this proposal would be to define in JIRA >> >>> the >> >>>>> same components as we defined here, >> >>>>> and have the maintainers keep JIRA meaningful for that particular >> >>>>> component. That would allow us to >> >>>>> easily generate some tables out of JIRA (like top known issues per >> >>>>> component, most requested features) >> >>>>> post them on the dev list once in a while as a "state of the union" >> >>>> report. >> >>>>> >> >>>>> Initial assignment of issues to components should be made by those >> >>> people >> >>>>> opening the issue. The maintainer >> >>>>> of that tagged component needs to change the tag, if the component >> >> was >> >>>>> classified incorrectly. >> >>>>> >> >>>>> >> >>>>> ====================================== >> >>>>> Initial Components and Maintainers Suggestion >> >>>>> ====================================== >> >>>>> >> >>>>> Below is a suggestion of how to define components for Flink. One goal >> >>> of >> >>>>> the division was to make it >> >>>>> obvious for the majority of questions and contributions to which >> >>>> component >> >>>>> they would relate. Otherwise, >> >>>>> if many contributions had fuzzy component associations, we would >> >> again >> >>>> not >> >>>>> solve the issue of having clear >> >>>>> responsibilities for who would track the progress and resolution. >> >>>>> >> >>>>> We also looked at each component and wrote the names of some people >> >> who >> >>>> we >> >>>>> thought were natural >> >>>>> experts for the components, and thus natural candidates for >> >>> maintainers. >> >>>>> >> >>>>> **These names are only a starting point for discussion.** >> >>>>> >> >>>>> Once agreed upon, the components and names of maintainers should be >> >>> kept >> >>>> in >> >>>>> the wiki and updated as >> >>>>> components change and people step up or down. >> >>>>> >> >>>>> >> >>>>> *DataSet API* (*Fabian, Greg, Gabor*) >> >>>>> - Incuding Hadoop compat. parts >> >>>>> >> >>>>> *DataStream API* (*Aljoscha, Max, Stephan*) >> >>>>> >> >>>>> *Runtime* >> >>>>> - Distributed Coordination (JobManager/TaskManager, Akka) (*Till*) >> >>>>> - Local Runtime (Memory Management, State Backends, >> >> Tasks/Operators) >> >>> ( >> >>>>> *Stephan*) >> >>>>> - Network (*Ufuk*) >> >>>>> >> >>>>> *Client/Optimizer* (*Fabian*) >> >>>>> >> >>>>> *Type system / Type extractor* (Timo) >> >>>>> >> >>>>> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*) >> >>>>> >> >>>>> *Libraries* >> >>>>> - Gelly (*Vasia, Greg*) >> >>>>> - ML (*Till, Theo*) >> >>>>> - CEP (*Till*) >> >>>>> - Python (*Chesnay*) >> >>>>> >> >>>>> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*) >> >>>>> >> >>>>> *Streaming Connectors* (*Robert*, *Aljoscha*) >> >>>>> >> >>>>> *Batch Connectors and Input/Output Formats* (*Chesnay*) >> >>>>> >> >>>>> *Storm Compatibility Layer* (*Mathias*) >> >>>>> >> >>>>> *Scala shell* (*Till*) >> >>>>> >> >>>>> *Startup Shell Scripts* (Ufuk) >> >>>>> >> >>>>> *Flink Build System, Maven Files* (*Robert*) >> >>>>> >> >>>>> *Documentation* (Ufuk) >> >>>>> >> >>>>> >> >>>>> Please let us know what you think about this proposal. >> >>>>> Happy discussing! >> >>>>> >> >>>>> Greetings, >> >>>>> Stephan >> >>>> >> >>> >> >> >> > >> >>