Re: [VOTE] Accept Zeppelin into the Apache Incubator

Fabian Hueske Fri, 19 Dec 2014 03:02:48 -0800

+1 (non-binding)

2014-12-19 7:24 GMT+01:00 Jaideep Dhok <jaideep.d...@inmobi.com>:


> +1 (non-binding)
>
> Thanks,
> Jaideep
>
> On Fri, Dec 19, 2014 at 11:50 AM, Hyunsik Choi <hyun...@apache.org> wrote:
> >
> > +1 (binding)
> >
> > On Friday, December 19, 2014, Roman Shaposhnik <r...@apache.org> wrote:
> >
> > > Following the discussion earlier:
> > >     http://s.apache.org/kTp
> > >
> > > I would like to call a VOTE for accepting
> > > Zeppelin as a new Incubator project.
> > >
> > > The proposal is available at:
> > >     https://wiki.apache.org/incubator/ZeppelinProposal
> > > and is also attached to the end of this email.
> > >
> > > Vote is open until at least Sunday, 21th December 2014,
> > > 23:59:00 PST
> > >
> > > [ ] +1 Accept Zeppelin into the Incubator
> > > [ ] ±0 Indifferent to the acceptance of Zeppelin
> > > [ ] -1 Do not accept Zeppelin because ...
> > >
> > > Thanks,
> > > Roman.
> > >
> > > == Abstract ==
> > > Zeppelin is a collaborative data analytics and visualization tool for
> > > distributed, general-purpose data processing systems such as Apache
> > > Spark, Apache Flink, etc.
> > >
> > > == Proposal ==
> > > Zeppelin is a modern web-based tool for the data scientists to
> > > collaborate over large-scale data exploration and visualization
> > > projects. It is a notebook style interpreter that enable collaborative
> > > analysis sessions sharing between users. Zeppelin is independent of
> > > the execution framework itself. Current version runs on top of Apache
> > > Spark but it has pluggable interpreter APIs to support other data
> > > processing systems. More execution frameworks could be added at a
> > > later date i.e Apache Flink, Crunch as well as SQL-like backends such
> > > as Hive, Tajo, MRQL.
> > >
> > > We have a strong preference for the project to be called Zeppelin. In
> > > case that may not be feasible, alternative names could be: “Mir”,
> > > “Yuga” or “Sora”.
> > >
> > > == Background ==
> > > Large scale data analysis workflow includes multiple steps like data
> > > acquisition, pre-processing, visualization, etc and may include
> > > inter-operation of multiple different tools and technologies. With the
> > > widespread of the open source general-purpose data processing systems
> > > like Spark there is a lack of open source, modern user-friendly tools
> > > that combine strengths of interpreted language for data analysis with
> > > new in-browser visualization libraries and collaborative capabilities.
> > >
> > > Zeppelin initially started as a GUI tool for diverse set of
> > > SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open
> > > source since its inception in Sep 2013. Later, it became clear that
> > > there was a need for a greater web-based tool for data scientists to
> > > collaborate on data exploration over the large-scale projects, not
> > > limited to SQL. So Zeppelin integrated full support of Apache Spark
> > > while adding a collaborative environment with the ability to run and
> > > share interpreter sessions in-browser
> > >
> > > == Rationale ==
> > > There are no open source alternatives for a collaborative
> > > notebook-based interpreter with support of multiple distributed data
> > > processing systems.
> > >
> > > As a number of companies adopting and contributing back to Zeppelin is
> > > growing, we think that having a long-term home at Apache foundation
> > > would be a great fit for the project ensuring that processes and
> > > procedures are in place to keep project and community “healthy” and
> > > free of any commercial, political or legal faults.
> > >
> > > == Initial Goals ==
> > > The initial goals will be to move the existing codebase to Apache and
> > > integrate with the Apache development process. This includes moving
> > > all infrastructure that we currently maintain, such as: a website, a
> > > mailing list, an issues tracker and a Jenkins CI, as mentioned in
> > > “Required Resources” section of current proposal.
> > > Once this is accomplished, we plan for incremental development and
> > > releases that follow the Apache guidelines.
> > > To increase adoption the major goal for the project would be to
> > > provide integration with as much projects from Apache data ecosystem
> > > as possible, including new interpreters for Apache Hive, Apache Drill
> > > and adding Zeppelin distribution to Apache Bigtop.
> > > On the community building side the main goal is to attract a diverse
> > > set of contributors by promoting Zeppelin to wide variety of
> > > engineers, starting a Zeppelin user groups around the globe and by
> > > engaging with other existing Apache projects communities online.
> > >
> > >
> > > == Current Status ==
> > > Currently, Zeppelin has 4 released versions and is used in production
> > > at a number of companies across the globe mentioned in Affiliation
> > > section. Current implementation status is pre-release with public API
> > > not being finalized yet. Current main and default backend processing
> > > engine is Apache Spark with consistent support of SparkSQL.
> > > Zeppelin is distributed as a binary package which includes an embedded
> > > webserver, application itself, a set of libraries and startup/shutdown
> > > scripts. No platform-specific installation packages are provided yet
> > > but it is something we are looking to provide as part of Apache Bigtop
> > > integration.
> > > Project codebase is currently hosted at github.com, which will form
> > > the basis of the Apache git repository.
> > >
> > > === Meritocracy ===
> > > Zeppelin is an open source project that already leverages meritocracy
> > > principles.  It was started by a handfull of people and now it has
> > > multiple contributors, although as the number of contribution grows we
> > > want to build a diverse developer and user community that is governed
> > > by the "Apache way". Users and new contributors will be treated with
> > > respect and welcomed; they will earn merit in the project by tendering
> > > quality patches and support that move the project forward. Those with
> > > a proven support and quality patch track record will be encouraged to
> > > become committers.
> > >
> > > === Community ===
> > > Zeppelin already has a burgeoning community of users spread across the
> > > world that leverage and contributes to the code base and mailing list.
> > > We hope that being part of Apache Foundation will help to grow it more
> > > and convert some of the users into active contributors to the project.
> > >
> > > === Core Developers ===
> > > The core developers of Zeppelin are listed in our contributors and
> > > initial PPMC below. It is a diverse group of people from two
> > > companies, NFLabs and Between, as mentioned in Affiliations section
> > > including at least one Apache committer and PPMC member, Lee Moon Soo,
> > > of Apache MRQL project.
> > >
> > > === Alignment ===
> > > Zeppelin is already integrated with Apache Spark. Integration with
> > > Apache Tajo and Apache MRQL is something that has been currently
> > > worked on. Apache Flink is a potential next integration step. We also
> > > plan to add a binary distribution of Zeppelin to Apache Bigtop to
> > > align it with whole ASF Hadoop data stack.
> > >
> > > == Known Risks ==
> > > We feel that for Zeppelin to become as successful as it can be, it
> > > needs to be picked up by as many back-end systems as possible, not
> > > only Apache Spark.
> > >
> > > === Orphaned Products ===
> > > Initial code contributors were from the same company but in last few
> > > months we see signs of the global adoption, at least 2 more companies
> > > in Europe and US have products based on a Zeppelin codebase. Other
> > > companies use Zeppelin in production for their data analytics
> > > workflows. We believe that this, plus the fact that Zeppelin already
> > > have contributors from different companies mitigates this risk well.
> > >
> > > === Inexperience with Open Source ===
> > > Zeppelin was born as an open source project from scratch. Majority of
> > > the current core contributors have experience working on other open
> > > source projects. We also expect that as we grow the community further
> > > based on meritocracy and with the guidance of more experienced mentors
> > > this will have a positive influence on the project in the long term.
> > >
> > > === Homogenous Developers ===
> > > The initial committers are from same region but there are already 2
> > > companies in the Europe that contribute to Zeppelin and others in US
> > > also reviewing it and being active on the mailing list. We are
> > > committed to create diverse mix of developers from all over the world.
> > >
> > > === Reliance on Salaried Developers ===
> > > Most of the Zeppelin contributors use it as tool of choice either in
> > > their own companies internally or distribute it as part of the
> > > product.
> > > Backend agnostic design helps to keep it as tool of choice for diverse
> > > community of data analysts even if they move from one employee to
> > > another.
> > > There also is at least one university in US with students who
> > > potentially might use Zeppelin for R’n’D projects.
> > >
> > > === Relationship with Other Apache Products ===
> > > Right now Zeppelin relies on Apache Spark to run distributed task
> > > across a cluster of machines, but it’s abstract interpreter design
> > > allows it to work with other systems like Apache MRQL, Apache Crunch
> > > as well as SQL-based systems like Apache Tajo, Apache Hive
> > >
> > > === A Excessive Fascination with the Apache Brand ===
> > > We believe that joining Apache will help us attract more contributors
> > > to Zeppelin, by giving us a well-defined, transparent development and
> > > governance process under a known brand. The reason for this proposal
> > > is not to gain publicity, but to further strengthen the longevity of
> > > the project without affiliation with any particular company. There are
> > > no plans to use of Apache brand in press releases nor posting
> > > advertising of acceptance it into Apache Incubator.
> > >
> > > === Documentation ===
> > > Additional documentation on Zeppelin may be found on its github
> website:
> > >  * Zeppelin overview:
> > > https://github.com/NFLabs/zeppelin/blob/master/README.md
> > >  * Zeppelin docs: http://zeppelin-project.org/docs/index.html
> > >  * Zeppelin road map:
> > > https://github.com/NFLabs/zeppelin/blob/master/Roadmap.md
> > >  * Zeppelin issue tracking:
> > > https://zeppelin-project.atlassian.net/browse/ZEPPELIN
> > >  * Zeppelin codebase: https://github.com/NFLabs/zeppelin
> > >  * User group: https://groups.google.com/group/zeppelin-developers
> > >
> > > == Initial Source ==
> > > Zeppelin codebase is currently hosted on Github:
> > > https://github.com/NFLabs/zeppelin
> > >
> > > === Source and Intellectual Property Submission Plan ===
> > > Currently, the Zeppleing codebase is distributed under an Apache 2.0
> > > License.
> > >
> > > == External Dependencies ==
> > > To the best of our knowledge, all other dependencies of Zeppelin are
> > > distributed under Apache compatible licenses (e.g. junit is EPL,
> > > Eclipse Public License v1.0, atmosphere-jersey is CDDL1.0  and
> > > dom4j:dom4 is BSD licensed, org.slf4j and
> > > org.java-websocket:Java-WebSocket are MIT).
> > > Only org.reflections:reflections
> > > https://github.com/ronmamo/reflections is WTFPL 2.0, which should not
> > > be a problem as of https://issues.apache.org/jira/browse/LEGAL-135
> > > Upon acceptance to the incubator, we would begin a thorough analysis
> > > of all transitive dependencies to verify this information and
> > > introduce license checking into the build and release process by
> > > integrating with Apache Rat.
> > >
> > > == Required Resources ==
> > > === Mailing list ===
> > > We will migrate the existing Zeppelin mailing lists as follows:
> > >  * zeppelin-develop...@googlegroups.com <javascript:;> -->
> > > d...@zeppelin.incubator.apache.org <javascript:;>
> > >  * us...@zeppelin.incubator.apache.org <javascript:;>
> > >  * priv...@zeppelin.incubator.apache.org <javascript:;> for PPMC
> members
> > >  * comm...@zeppelin.incubator.apache.org <javascript:;>
> > > The latter is to be consistent with the new PIAO naming scheme for
> > > podlings.
> > >
> > > === Source control ===
> > > Zeppelin team would like to use Git for source control, as it already
> > > uses Git. We request a writeable Git repo for Zeppelin, and mirroring
> > > to be set up to Github through INFRA.
> > > https://git-wip-us.apache.org/repos/asf/incubator-zeppelin.git
> > >
> > > === Issue Tracking ===
> > > Zeppelin currently uses the Jira tracking system
> > > https://zeppelin-project.atlassian.net/browse/ZEPPELIN. We will
> > > migrate to the Apache JIRA:
> > > http://issues.apache.org/jira/browse/ZEPPELIN
> > >
> > >
> > > === Other Resources ===
> > >  * Jenkins/Hudson for builds and test running.
> > >  * Wiki for documentation purposes
> > >  * Blog to improve project dissemination
> > >
> > > == Initial Committers ==
> > >  * Lee Moon Soo <moon at apache dot org>
> > >  * Anthony Corbacho <corbacho.anthony at gmail dot com>, CLA submitted
> > >  * Damien Corneau <corneadoug at gmail dot com>, CLA submitted
> > >  * Alexander Bezzubov <abezzubov at nflabs dot com>, CLA confirmed
> > >  * Kevin Sangwoo Kim <sangwookim dot me at gmail dot us>, CLA confirmed
> > >
> > > == Affiliations ==
> > >  * Lee Moon Soo: NFLabs
> > >  * Anthony Corbacho: NFLabs
> > >  * Damien Corneau: NFLabs
> > >  * Alexander Bezzubov: NFLabs
> > >  * Kevin Sangwoo Kim: VCNC (a.k.a Between)
> > >
> > > == Sponsors ==
> > > === Champion ===
> > >  * Roman Shaposhnik
> > >
> > > === Nominated Mentors ===
> > >  * Konstantin Boudnik
> > >  * Ted Dunning
> > >  * Henry Saputra
> > >  * Roman Shaposhnik
> > >  * Hyunsik Choi
> > >
> > > === Sponsoring Entity ===
> > >  The Apache Incubator
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > <javascript:;>
> > > For additional commands, e-mail: general-h...@incubator.apache.org
> > > <javascript:;>
> > >
> > >
> >
>
> --
> _____________________________________________________________
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.
>

Re: [VOTE] Accept Zeppelin into the Apache Incubator

Reply via email to