+1 (non-binding) 2014-12-19 7:24 GMT+01:00 Jaideep Dhok <jaideep.d...@inmobi.com>:
> +1 (non-binding) > > Thanks, > Jaideep > > On Fri, Dec 19, 2014 at 11:50 AM, Hyunsik Choi <hyun...@apache.org> wrote: > > > > +1 (binding) > > > > On Friday, December 19, 2014, Roman Shaposhnik <r...@apache.org> wrote: > > > > > Following the discussion earlier: > > > http://s.apache.org/kTp > > > > > > I would like to call a VOTE for accepting > > > Zeppelin as a new Incubator project. > > > > > > The proposal is available at: > > > https://wiki.apache.org/incubator/ZeppelinProposal > > > and is also attached to the end of this email. > > > > > > Vote is open until at least Sunday, 21th December 2014, > > > 23:59:00 PST > > > > > > [ ] +1 Accept Zeppelin into the Incubator > > > [ ] ±0 Indifferent to the acceptance of Zeppelin > > > [ ] -1 Do not accept Zeppelin because ... > > > > > > Thanks, > > > Roman. > > > > > > == Abstract == > > > Zeppelin is a collaborative data analytics and visualization tool for > > > distributed, general-purpose data processing systems such as Apache > > > Spark, Apache Flink, etc. > > > > > > == Proposal == > > > Zeppelin is a modern web-based tool for the data scientists to > > > collaborate over large-scale data exploration and visualization > > > projects. It is a notebook style interpreter that enable collaborative > > > analysis sessions sharing between users. Zeppelin is independent of > > > the execution framework itself. Current version runs on top of Apache > > > Spark but it has pluggable interpreter APIs to support other data > > > processing systems. More execution frameworks could be added at a > > > later date i.e Apache Flink, Crunch as well as SQL-like backends such > > > as Hive, Tajo, MRQL. > > > > > > We have a strong preference for the project to be called Zeppelin. In > > > case that may not be feasible, alternative names could be: “Mir”, > > > “Yuga” or “Sora”. > > > > > > == Background == > > > Large scale data analysis workflow includes multiple steps like data > > > acquisition, pre-processing, visualization, etc and may include > > > inter-operation of multiple different tools and technologies. With the > > > widespread of the open source general-purpose data processing systems > > > like Spark there is a lack of open source, modern user-friendly tools > > > that combine strengths of interpreted language for data analysis with > > > new in-browser visualization libraries and collaborative capabilities. > > > > > > Zeppelin initially started as a GUI tool for diverse set of > > > SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open > > > source since its inception in Sep 2013. Later, it became clear that > > > there was a need for a greater web-based tool for data scientists to > > > collaborate on data exploration over the large-scale projects, not > > > limited to SQL. So Zeppelin integrated full support of Apache Spark > > > while adding a collaborative environment with the ability to run and > > > share interpreter sessions in-browser > > > > > > == Rationale == > > > There are no open source alternatives for a collaborative > > > notebook-based interpreter with support of multiple distributed data > > > processing systems. > > > > > > As a number of companies adopting and contributing back to Zeppelin is > > > growing, we think that having a long-term home at Apache foundation > > > would be a great fit for the project ensuring that processes and > > > procedures are in place to keep project and community “healthy” and > > > free of any commercial, political or legal faults. > > > > > > == Initial Goals == > > > The initial goals will be to move the existing codebase to Apache and > > > integrate with the Apache development process. This includes moving > > > all infrastructure that we currently maintain, such as: a website, a > > > mailing list, an issues tracker and a Jenkins CI, as mentioned in > > > “Required Resources” section of current proposal. > > > Once this is accomplished, we plan for incremental development and > > > releases that follow the Apache guidelines. > > > To increase adoption the major goal for the project would be to > > > provide integration with as much projects from Apache data ecosystem > > > as possible, including new interpreters for Apache Hive, Apache Drill > > > and adding Zeppelin distribution to Apache Bigtop. > > > On the community building side the main goal is to attract a diverse > > > set of contributors by promoting Zeppelin to wide variety of > > > engineers, starting a Zeppelin user groups around the globe and by > > > engaging with other existing Apache projects communities online. > > > > > > > > > == Current Status == > > > Currently, Zeppelin has 4 released versions and is used in production > > > at a number of companies across the globe mentioned in Affiliation > > > section. Current implementation status is pre-release with public API > > > not being finalized yet. Current main and default backend processing > > > engine is Apache Spark with consistent support of SparkSQL. > > > Zeppelin is distributed as a binary package which includes an embedded > > > webserver, application itself, a set of libraries and startup/shutdown > > > scripts. No platform-specific installation packages are provided yet > > > but it is something we are looking to provide as part of Apache Bigtop > > > integration. > > > Project codebase is currently hosted at github.com, which will form > > > the basis of the Apache git repository. > > > > > > === Meritocracy === > > > Zeppelin is an open source project that already leverages meritocracy > > > principles. It was started by a handfull of people and now it has > > > multiple contributors, although as the number of contribution grows we > > > want to build a diverse developer and user community that is governed > > > by the "Apache way". Users and new contributors will be treated with > > > respect and welcomed; they will earn merit in the project by tendering > > > quality patches and support that move the project forward. Those with > > > a proven support and quality patch track record will be encouraged to > > > become committers. > > > > > > === Community === > > > Zeppelin already has a burgeoning community of users spread across the > > > world that leverage and contributes to the code base and mailing list. > > > We hope that being part of Apache Foundation will help to grow it more > > > and convert some of the users into active contributors to the project. > > > > > > === Core Developers === > > > The core developers of Zeppelin are listed in our contributors and > > > initial PPMC below. It is a diverse group of people from two > > > companies, NFLabs and Between, as mentioned in Affiliations section > > > including at least one Apache committer and PPMC member, Lee Moon Soo, > > > of Apache MRQL project. > > > > > > === Alignment === > > > Zeppelin is already integrated with Apache Spark. Integration with > > > Apache Tajo and Apache MRQL is something that has been currently > > > worked on. Apache Flink is a potential next integration step. We also > > > plan to add a binary distribution of Zeppelin to Apache Bigtop to > > > align it with whole ASF Hadoop data stack. > > > > > > == Known Risks == > > > We feel that for Zeppelin to become as successful as it can be, it > > > needs to be picked up by as many back-end systems as possible, not > > > only Apache Spark. > > > > > > === Orphaned Products === > > > Initial code contributors were from the same company but in last few > > > months we see signs of the global adoption, at least 2 more companies > > > in Europe and US have products based on a Zeppelin codebase. Other > > > companies use Zeppelin in production for their data analytics > > > workflows. We believe that this, plus the fact that Zeppelin already > > > have contributors from different companies mitigates this risk well. > > > > > > === Inexperience with Open Source === > > > Zeppelin was born as an open source project from scratch. Majority of > > > the current core contributors have experience working on other open > > > source projects. We also expect that as we grow the community further > > > based on meritocracy and with the guidance of more experienced mentors > > > this will have a positive influence on the project in the long term. > > > > > > === Homogenous Developers === > > > The initial committers are from same region but there are already 2 > > > companies in the Europe that contribute to Zeppelin and others in US > > > also reviewing it and being active on the mailing list. We are > > > committed to create diverse mix of developers from all over the world. > > > > > > === Reliance on Salaried Developers === > > > Most of the Zeppelin contributors use it as tool of choice either in > > > their own companies internally or distribute it as part of the > > > product. > > > Backend agnostic design helps to keep it as tool of choice for diverse > > > community of data analysts even if they move from one employee to > > > another. > > > There also is at least one university in US with students who > > > potentially might use Zeppelin for R’n’D projects. > > > > > > === Relationship with Other Apache Products === > > > Right now Zeppelin relies on Apache Spark to run distributed task > > > across a cluster of machines, but it’s abstract interpreter design > > > allows it to work with other systems like Apache MRQL, Apache Crunch > > > as well as SQL-based systems like Apache Tajo, Apache Hive > > > > > > === A Excessive Fascination with the Apache Brand === > > > We believe that joining Apache will help us attract more contributors > > > to Zeppelin, by giving us a well-defined, transparent development and > > > governance process under a known brand. The reason for this proposal > > > is not to gain publicity, but to further strengthen the longevity of > > > the project without affiliation with any particular company. There are > > > no plans to use of Apache brand in press releases nor posting > > > advertising of acceptance it into Apache Incubator. > > > > > > === Documentation === > > > Additional documentation on Zeppelin may be found on its github > website: > > > * Zeppelin overview: > > > https://github.com/NFLabs/zeppelin/blob/master/README.md > > > * Zeppelin docs: http://zeppelin-project.org/docs/index.html > > > * Zeppelin road map: > > > https://github.com/NFLabs/zeppelin/blob/master/Roadmap.md > > > * Zeppelin issue tracking: > > > https://zeppelin-project.atlassian.net/browse/ZEPPELIN > > > * Zeppelin codebase: https://github.com/NFLabs/zeppelin > > > * User group: https://groups.google.com/group/zeppelin-developers > > > > > > == Initial Source == > > > Zeppelin codebase is currently hosted on Github: > > > https://github.com/NFLabs/zeppelin > > > > > > === Source and Intellectual Property Submission Plan === > > > Currently, the Zeppleing codebase is distributed under an Apache 2.0 > > > License. > > > > > > == External Dependencies == > > > To the best of our knowledge, all other dependencies of Zeppelin are > > > distributed under Apache compatible licenses (e.g. junit is EPL, > > > Eclipse Public License v1.0, atmosphere-jersey is CDDL1.0 and > > > dom4j:dom4 is BSD licensed, org.slf4j and > > > org.java-websocket:Java-WebSocket are MIT). > > > Only org.reflections:reflections > > > https://github.com/ronmamo/reflections is WTFPL 2.0, which should not > > > be a problem as of https://issues.apache.org/jira/browse/LEGAL-135 > > > Upon acceptance to the incubator, we would begin a thorough analysis > > > of all transitive dependencies to verify this information and > > > introduce license checking into the build and release process by > > > integrating with Apache Rat. > > > > > > == Required Resources == > > > === Mailing list === > > > We will migrate the existing Zeppelin mailing lists as follows: > > > * zeppelin-develop...@googlegroups.com <javascript:;> --> > > > d...@zeppelin.incubator.apache.org <javascript:;> > > > * us...@zeppelin.incubator.apache.org <javascript:;> > > > * priv...@zeppelin.incubator.apache.org <javascript:;> for PPMC > members > > > * comm...@zeppelin.incubator.apache.org <javascript:;> > > > The latter is to be consistent with the new PIAO naming scheme for > > > podlings. > > > > > > === Source control === > > > Zeppelin team would like to use Git for source control, as it already > > > uses Git. We request a writeable Git repo for Zeppelin, and mirroring > > > to be set up to Github through INFRA. > > > https://git-wip-us.apache.org/repos/asf/incubator-zeppelin.git > > > > > > === Issue Tracking === > > > Zeppelin currently uses the Jira tracking system > > > https://zeppelin-project.atlassian.net/browse/ZEPPELIN. We will > > > migrate to the Apache JIRA: > > > http://issues.apache.org/jira/browse/ZEPPELIN > > > > > > > > > === Other Resources === > > > * Jenkins/Hudson for builds and test running. > > > * Wiki for documentation purposes > > > * Blog to improve project dissemination > > > > > > == Initial Committers == > > > * Lee Moon Soo <moon at apache dot org> > > > * Anthony Corbacho <corbacho.anthony at gmail dot com>, CLA submitted > > > * Damien Corneau <corneadoug at gmail dot com>, CLA submitted > > > * Alexander Bezzubov <abezzubov at nflabs dot com>, CLA confirmed > > > * Kevin Sangwoo Kim <sangwookim dot me at gmail dot us>, CLA confirmed > > > > > > == Affiliations == > > > * Lee Moon Soo: NFLabs > > > * Anthony Corbacho: NFLabs > > > * Damien Corneau: NFLabs > > > * Alexander Bezzubov: NFLabs > > > * Kevin Sangwoo Kim: VCNC (a.k.a Between) > > > > > > == Sponsors == > > > === Champion === > > > * Roman Shaposhnik > > > > > > === Nominated Mentors === > > > * Konstantin Boudnik > > > * Ted Dunning > > > * Henry Saputra > > > * Roman Shaposhnik > > > * Hyunsik Choi > > > > > > === Sponsoring Entity === > > > The Apache Incubator > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > > <javascript:;> > > > For additional commands, e-mail: general-h...@incubator.apache.org > > > <javascript:;> > > > > > > > > > > -- > _____________________________________________________________ > The information contained in this communication is intended solely for the > use of the individual or entity to whom it is addressed and others > authorized to receive it. It may contain confidential or legally privileged > information. If you are not the intended recipient you are hereby notified > that any disclosure, copying, distribution or taking any action in reliance > on the contents of this information is strictly prohibited and may be > unlawful. If you have received this communication in error, please notify > us immediately by responding to this email and then delete it from your > system. The firm is neither liable for the proper and complete transmission > of the information contained in this communication nor for any delay in its > receipt. >