+1 On Thu, Dec 18, 2014 at 9:29 PM, Roman Shaposhnik <r...@apache.org> wrote: > Following the discussion earlier: > http://s.apache.org/kTp > > I would like to call a VOTE for accepting > Zeppelin as a new Incubator project. > > The proposal is available at: > https://wiki.apache.org/incubator/ZeppelinProposal > and is also attached to the end of this email. > > Vote is open until at least Sunday, 21th December 2014, > 23:59:00 PST > > [ ] +1 Accept Zeppelin into the Incubator > [ ] ±0 Indifferent to the acceptance of Zeppelin > [ ] -1 Do not accept Zeppelin because ... > > Thanks, > Roman. > > == Abstract == > Zeppelin is a collaborative data analytics and visualization tool for > distributed, general-purpose data processing systems such as Apache > Spark, Apache Flink, etc. > > == Proposal == > Zeppelin is a modern web-based tool for the data scientists to > collaborate over large-scale data exploration and visualization > projects. It is a notebook style interpreter that enable collaborative > analysis sessions sharing between users. Zeppelin is independent of > the execution framework itself. Current version runs on top of Apache > Spark but it has pluggable interpreter APIs to support other data > processing systems. More execution frameworks could be added at a > later date i.e Apache Flink, Crunch as well as SQL-like backends such > as Hive, Tajo, MRQL. > > We have a strong preference for the project to be called Zeppelin. In > case that may not be feasible, alternative names could be: “Mir”, > “Yuga” or “Sora”. > > == Background == > Large scale data analysis workflow includes multiple steps like data > acquisition, pre-processing, visualization, etc and may include > inter-operation of multiple different tools and technologies. With the > widespread of the open source general-purpose data processing systems > like Spark there is a lack of open source, modern user-friendly tools > that combine strengths of interpreted language for data analysis with > new in-browser visualization libraries and collaborative capabilities. > > Zeppelin initially started as a GUI tool for diverse set of > SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open > source since its inception in Sep 2013. Later, it became clear that > there was a need for a greater web-based tool for data scientists to > collaborate on data exploration over the large-scale projects, not > limited to SQL. So Zeppelin integrated full support of Apache Spark > while adding a collaborative environment with the ability to run and > share interpreter sessions in-browser > > == Rationale == > There are no open source alternatives for a collaborative > notebook-based interpreter with support of multiple distributed data > processing systems. > > As a number of companies adopting and contributing back to Zeppelin is > growing, we think that having a long-term home at Apache foundation > would be a great fit for the project ensuring that processes and > procedures are in place to keep project and community “healthy” and > free of any commercial, political or legal faults. > > == Initial Goals == > The initial goals will be to move the existing codebase to Apache and > integrate with the Apache development process. This includes moving > all infrastructure that we currently maintain, such as: a website, a > mailing list, an issues tracker and a Jenkins CI, as mentioned in > “Required Resources” section of current proposal. > Once this is accomplished, we plan for incremental development and > releases that follow the Apache guidelines. > To increase adoption the major goal for the project would be to > provide integration with as much projects from Apache data ecosystem > as possible, including new interpreters for Apache Hive, Apache Drill > and adding Zeppelin distribution to Apache Bigtop. > On the community building side the main goal is to attract a diverse > set of contributors by promoting Zeppelin to wide variety of > engineers, starting a Zeppelin user groups around the globe and by > engaging with other existing Apache projects communities online. > > > == Current Status == > Currently, Zeppelin has 4 released versions and is used in production > at a number of companies across the globe mentioned in Affiliation > section. Current implementation status is pre-release with public API > not being finalized yet. Current main and default backend processing > engine is Apache Spark with consistent support of SparkSQL. > Zeppelin is distributed as a binary package which includes an embedded > webserver, application itself, a set of libraries and startup/shutdown > scripts. No platform-specific installation packages are provided yet > but it is something we are looking to provide as part of Apache Bigtop > integration. > Project codebase is currently hosted at github.com, which will form > the basis of the Apache git repository. > > === Meritocracy === > Zeppelin is an open source project that already leverages meritocracy > principles. It was started by a handfull of people and now it has > multiple contributors, although as the number of contribution grows we > want to build a diverse developer and user community that is governed > by the "Apache way". Users and new contributors will be treated with > respect and welcomed; they will earn merit in the project by tendering > quality patches and support that move the project forward. Those with > a proven support and quality patch track record will be encouraged to > become committers. > > === Community === > Zeppelin already has a burgeoning community of users spread across the > world that leverage and contributes to the code base and mailing list. > We hope that being part of Apache Foundation will help to grow it more > and convert some of the users into active contributors to the project. > > === Core Developers === > The core developers of Zeppelin are listed in our contributors and > initial PPMC below. It is a diverse group of people from two > companies, NFLabs and Between, as mentioned in Affiliations section > including at least one Apache committer and PPMC member, Lee Moon Soo, > of Apache MRQL project. > > === Alignment === > Zeppelin is already integrated with Apache Spark. Integration with > Apache Tajo and Apache MRQL is something that has been currently > worked on. Apache Flink is a potential next integration step. We also > plan to add a binary distribution of Zeppelin to Apache Bigtop to > align it with whole ASF Hadoop data stack. > > == Known Risks == > We feel that for Zeppelin to become as successful as it can be, it > needs to be picked up by as many back-end systems as possible, not > only Apache Spark. > > === Orphaned Products === > Initial code contributors were from the same company but in last few > months we see signs of the global adoption, at least 2 more companies > in Europe and US have products based on a Zeppelin codebase. Other > companies use Zeppelin in production for their data analytics > workflows. We believe that this, plus the fact that Zeppelin already > have contributors from different companies mitigates this risk well. > > === Inexperience with Open Source === > Zeppelin was born as an open source project from scratch. Majority of > the current core contributors have experience working on other open > source projects. We also expect that as we grow the community further > based on meritocracy and with the guidance of more experienced mentors > this will have a positive influence on the project in the long term. > > === Homogenous Developers === > The initial committers are from same region but there are already 2 > companies in the Europe that contribute to Zeppelin and others in US > also reviewing it and being active on the mailing list. We are > committed to create diverse mix of developers from all over the world. > > === Reliance on Salaried Developers === > Most of the Zeppelin contributors use it as tool of choice either in > their own companies internally or distribute it as part of the > product. > Backend agnostic design helps to keep it as tool of choice for diverse > community of data analysts even if they move from one employee to > another. > There also is at least one university in US with students who > potentially might use Zeppelin for R’n’D projects. > > === Relationship with Other Apache Products === > Right now Zeppelin relies on Apache Spark to run distributed task > across a cluster of machines, but it’s abstract interpreter design > allows it to work with other systems like Apache MRQL, Apache Crunch > as well as SQL-based systems like Apache Tajo, Apache Hive > > === A Excessive Fascination with the Apache Brand === > We believe that joining Apache will help us attract more contributors > to Zeppelin, by giving us a well-defined, transparent development and > governance process under a known brand. The reason for this proposal > is not to gain publicity, but to further strengthen the longevity of > the project without affiliation with any particular company. There are > no plans to use of Apache brand in press releases nor posting > advertising of acceptance it into Apache Incubator. > > === Documentation === > Additional documentation on Zeppelin may be found on its github website: > * Zeppelin overview: https://github.com/NFLabs/zeppelin/blob/master/README.md > * Zeppelin docs: http://zeppelin-project.org/docs/index.html > * Zeppelin road map: > https://github.com/NFLabs/zeppelin/blob/master/Roadmap.md > * Zeppelin issue tracking: > https://zeppelin-project.atlassian.net/browse/ZEPPELIN > * Zeppelin codebase: https://github.com/NFLabs/zeppelin > * User group: https://groups.google.com/group/zeppelin-developers > > == Initial Source == > Zeppelin codebase is currently hosted on Github: > https://github.com/NFLabs/zeppelin > > === Source and Intellectual Property Submission Plan === > Currently, the Zeppleing codebase is distributed under an Apache 2.0 License. > > == External Dependencies == > To the best of our knowledge, all other dependencies of Zeppelin are > distributed under Apache compatible licenses (e.g. junit is EPL, > Eclipse Public License v1.0, atmosphere-jersey is CDDL1.0 and > dom4j:dom4 is BSD licensed, org.slf4j and > org.java-websocket:Java-WebSocket are MIT). > Only org.reflections:reflections > https://github.com/ronmamo/reflections is WTFPL 2.0, which should not > be a problem as of https://issues.apache.org/jira/browse/LEGAL-135 > Upon acceptance to the incubator, we would begin a thorough analysis > of all transitive dependencies to verify this information and > introduce license checking into the build and release process by > integrating with Apache Rat. > > == Required Resources == > === Mailing list === > We will migrate the existing Zeppelin mailing lists as follows: > * zeppelin-develop...@googlegroups.com --> d...@zeppelin.incubator.apache.org > * us...@zeppelin.incubator.apache.org > * priv...@zeppelin.incubator.apache.org for PPMC members > * comm...@zeppelin.incubator.apache.org > The latter is to be consistent with the new PIAO naming scheme for podlings. > > === Source control === > Zeppelin team would like to use Git for source control, as it already > uses Git. We request a writeable Git repo for Zeppelin, and mirroring > to be set up to Github through INFRA. > https://git-wip-us.apache.org/repos/asf/incubator-zeppelin.git > > === Issue Tracking === > Zeppelin currently uses the Jira tracking system > https://zeppelin-project.atlassian.net/browse/ZEPPELIN. We will > migrate to the Apache JIRA: > http://issues.apache.org/jira/browse/ZEPPELIN > > > === Other Resources === > * Jenkins/Hudson for builds and test running. > * Wiki for documentation purposes > * Blog to improve project dissemination > > == Initial Committers == > * Lee Moon Soo <moon at apache dot org> > * Anthony Corbacho <corbacho.anthony at gmail dot com>, CLA submitted > * Damien Corneau <corneadoug at gmail dot com>, CLA submitted > * Alexander Bezzubov <abezzubov at nflabs dot com>, CLA confirmed > * Kevin Sangwoo Kim <sangwookim dot me at gmail dot us>, CLA confirmed > > == Affiliations == > * Lee Moon Soo: NFLabs > * Anthony Corbacho: NFLabs > * Damien Corneau: NFLabs > * Alexander Bezzubov: NFLabs > * Kevin Sangwoo Kim: VCNC (a.k.a Between) > > == Sponsors == > === Champion === > * Roman Shaposhnik > > === Nominated Mentors === > * Konstantin Boudnik > * Ted Dunning > * Henry Saputra > * Roman Shaposhnik > * Hyunsik Choi > > === Sponsoring Entity === > The Apache Incubator > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org >
--------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org