Re: [VOTE] Apache Spark for the Incubator

Andy Konwinski Fri, 14 Jun 2013 02:38:04 -0700

+1 (non-binding)

Andy



On Sat, Jun 8, 2013 at 12:36 AM, Matei Zaharia <[email protected]>wrote:

> +1 (non-binding)
>
> Matei
>
> On Jun 8, 2013, at 12:25 AM, Hitesh Shah <[email protected]> wrote:
>
> > +1 (non-binding)
> >
> > -- Hitesh
> >
> > On Jun 7, 2013, at 10:34 PM, Mattmann, Chris A (398J) wrote:
> >
> >> Hi Folks,
> >>
> >> OK discussion has died down, time to VOTE to accept Spark into the
> >> Apache Incubator. I'll let the VOTE run for at least a week.
> >>
> >> So far I've heard +1s from the following folks, so no need for them
> >> to VOTE again unless they want to change their VOTE:
> >>
> >> +1
> >>
> >> Chris Mattmann*
> >> Konstantin Boudnik
> >> Henry Saputra*
> >> Reynold Xin
> >> Pei Chen
> >> Roman Shaposhnik*
> >> Suresh Marru*
> >>
> >> * -indicates IPMC
> >>
> >> [ ] +1 Accept Spark into the Apache Incubator.
> >> [ ] +0 Don't care.
> >> [ ] -1 Don't accept Spark into the Apache Incubator because..
> >>
> >> Proposal text is below.
> >>
> >> === Abstract ===
> >> Spark is an open source system for large-scale data analysis on
> clusters.
> >>
> >> === Proposal ===
> >> Spark is an open source system for fast and flexible large-scale data
> >> analysis. Spark provides a general purpose runtime that supports
> >> low-latency execution in several forms. These include interactive
> >> exploration of very large datasets, near real-time stream processing,
> and
> >> ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
> >> with HDFS, HBase, Cassandra and several other storage storage layers,
> and
> >> exposes APIs in Scala, Java and Python.
> >> Background
> >> Spark started as U.C. Berkeley research project, designed to efficiently
> >> run machine learning algorithms on large datasets. Over time, it has
> >> evolved into a general computing engine as outlined above. Spark¹s
> >> developer community has also grown to include additional institutions,
> >> such as universities, research labs, and corporations. Funding has been
> >> provided by various institutions including the U.S. National Science
> >> Foundation, DARPA, and a number of industry sponsors. See:
> >> https://amplab.cs.berkeley.edu/sponsors/ for full details.
> >>
> >> === Rationale ===
> >> As the number of contributors to Spark has grown, we have sought for a
> >> long-term home for the project, and we believe the Apache foundation
> would
> >> be a great fit. Spark is a natural fit for the Apache foundation: Spark
> >> already interoperates with several existing Apache projects (HDFS,
> HBase,
> >> Hive, Cassandra, Avro and Flume to name a few). The Spark team is
> familiar
> >> with the Apache process and and subscribes to the Apache mission - the
> >> team includes multiple Apache committers already. Finally, joining
> Apache
> >> will help coordinate the development effort of the growing number of
> >> organizations which contribute to Spark.
> >>
> >> == Initial Goals ==
> >> The initial goals will most likely be to move the existing codebase to
> >> Apache and integrate with the Apache development process. Furthermore,
> we
> >> plan for incremental development, and releases along with the Apache
> >> guidelines.
> >>
> >> === Current Status ===
> >> == Meritocracy ==
> >> The Spark project already operates on meritocratic principles. Today,
> >> Spark has several developers and has accepted multiple major patches
> from
> >> outside of U.C. Berkeley. While this process has remained mostly
> informal
> >> (we do not have an official committer list), an implicit organization
> >> exists in which individuals who contribute major components act as
> >> maintainers for those modules. If accepted, the Spark project would
> >> include several of these participants as committers from the onset. We
> >> will work to identify all committers and PPMC members for the project
> and
> >> to operate under the ASF meritocratic principles.
> >>
> >> === Community ===
> >> Acceptance into the Apache foundation would bolster the already strong
> >> user and developer community around Spark. That community includes
> dozens
> >> of contributors from several institutions, a meetup group with several
> >> hundred members, and an active mailing list composed of hundreds of
> users.
> >> Core Developers
> >> The core developers of our project are listed in our contributors and
> >> initial PPMC below. Though many exist at UC Berkeley, there is a
> >> representative cross sampling of other organizations including
> Quantifind,
> >> Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends.
> >>
> >>
> >> === Alignment ===
> >> Our proposed effort aligns with several ongoing BIGDATA and U.S.
> National
> >> priority funding interests including the NSF and its Expeditions
> program,
> >> and the DARPA XDATA project. Our industry partners and collaborators are
> >> well aligned with our code base.
> >>
> >> There are also a number of related Apache projects and dependencies,
> that
> >> will be mentioned in the Relationships with Other Apache products
> section.
> >>
> >> == Known Risks ==
> >>
> >> === Orphaned Products ===
> >> Given the current level of investment in Spark - the risk of the project
> >> being abandoned is minimal. There are several constituents who are
> highly
> >> incentivized to continue development. The U.C. Berkeley AMPLab relies on
> >> Spark as a platform for a large number of long-term research projects.
> >> Several companies have build verticalized products which are tightly
> >> dependent on Spark. Other companies have devoted significant internal
> >> infrastructure investment in Spark.
> >>
> >> === Inexperience with Open Source ===
> >> Spark has existed as a healthy open source project for several years.
> >> During that time, Matei and others have curated an open-source community
> >> successfully, attracting developers from a diverse group of companies
> >> including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel,
> and
> >> Webtrends.
> >>
> >> === Homogenous Developers ===
> >> The initial list of committers includes developers from several
> >> institutions, including Quantifind, Microsoft, Yahoo!, ClearStory Data,
> >> Bizo, Intel, and Webtrends.
> >>
> >> === Reliance on Salaried Developers ===
> >> Like most open source projects, Spark receives a substantial support
> from
> >> salaried developers. A large fraction of Spark development is supported
> by
> >> graduate students at U.C. Berkeley in the course of research degrees -
> >> this is more a ³volunteer² relationship, since in most cases students
> >> contribute vastly more than is necessary to immediately support
> research.
> >> In addition, those working from within corporations often devote ³after
> >> hours² or spare time in the project - and these come from several
> >> organizations. We will work to ensure that the ability for the project
> to
> >> continuously be stewarded and to proceed forward independent of salaried
> >> developers is continued.
> >>
> >>
> >> === Relationship with Other Apache Products ===
> >> Spark inter-operates with several existing Apache products by supporting
> >> them as storage layers: Apache Cassandra, Apache HBase, and Apache
> Hadoop
> >> (HDFS). It also uses several Apache components internally including
> Apache
> >> Maven and several Apache Commons libraries. Finally, Shark (a higher
> layer
> >> framework built on Spark) inter-operates with Apache Hive. We will
> explore
> >> the relationship between Spark and Apache Gora, which also provides
> >> in-memory object storage (Champion Mattmann was the Champion for Apace
> >> Gora so we expect alignment and cross pollination between our efforts).
> >>
> >> Spark offers an alternative computation engine to Apache Hadoop
> >> (MapReduce). Unlike MapReduce, Spark is designed for lower-latency and
> >> interactive workloads. This makes the projects complimentary: many users
> >> run MapReduce and Spark side-by-side.
> >>
> >> === A Excessive Fascination with the Apache Brand ===
> >> Spark is already a healthy and relatively well known open source
> project.
> >> This proposal is not for the purpose of generating publicity. Rather,
> the
> >> primary benefits to joining Apache are those outlined in the Rationale
> >> section.
> >>
> >> === Documentation ===
> >> The reader will find these websites highly relevant:
> >> * Spark website: http://spark-project.org/
> >> * Spark documentation: http://spark-project.org/documentation/
> >> * Issue tracking: https://spark-project.atlassian.net/
> >> * Codebase: https://github.com/mesos/spark
> >> * User group: https://groups.google.com/group/spark-users
> >>
> >> == Initial Source ==
> >> The Spark codebase is currently hosted on Github:
> >> https://github.com/mesos/spark. This is the exact codebase that we
> would
> >> migrate to the Apache foundation.
> >> Source and Intellectual Property Submission Plan
> >> Currently, the Spark codebase is distributed under a BSD license. The
> vast
> >> majority of code has copyright held by the University of California.
> Upon
> >> entering Apache, Spark will migrate to an Apache License with all
> >> copyright assigned to the Apache Foundation. The University of
> California
> >> will transfer all copyright to the Apache Foundation. In certain cases
> >> where individuals hold copyright, we will have individuals sign over
> >> copyright to the Apache foundation as well.
> >>
> >> Going forward, all commits would assign copyright directly to the Apache
> >> foundation through our signed Individual Contributor License Agreements
> >> for all initial committers on the project.
> >>
> >>
> >> == External Dependencies ==
> >> To the best of our knowledge, all dependencies of Spark are distributed
> >> under Apache compatible licenses. Upon acceptance to the incubator, we
> >> would begin a thorough analysis of all transitive dependencies to verify
> >> this fact and introduce license checking into the build and release
> >> process (for instance integrating Apache Rat).
> >>
> >> == Required Resources ==
> >> === Mailing list ===
> >> We will migrate the existing Spark mailing lists as follows:
> >>
> >> * spark-users@googlegroups --> [email protected]
> >> * spark-developers@googlegroups --> [email protected]
> >> * spark-commits are hosted on Github, so we would request
> >> [email protected]
> >>
> >> The latter is to be consistent with the new PIAO naming scheme for
> >> podlings.
> >>
> >> === Source control ===
> >> The Spark team would like to use Git for source control, due to our
> >> current use of Git.
> >> We request a writeable Git repo for Spark, and mirroring to be set up to
> >> Github through INFRA. Champion Mattmann can assist with creating INFRA
> >> tickets for this.
> >>
> >> === Issue Tracking ===
> >> Spark currently uses a hosted JIRA deployment for issue tracking. We
> will
> >> migrate to the Apache JIRA.
> >> http://issues.apache.org/jira/browse/SPARK
> >>
> >> == Initial Committers ==
> >> * Matei Zaharia <[email protected]>
> >> * Ankur Dave <[email protected]>
> >> * Tathagata Das <[email protected]>
> >> * Haoyuan Li <[email protected]>
> >> * Josh Rosen <[email protected]>
> >> * Reynold Xin <[email protected]>
> >> * Shivaram Venkataraman <[email protected]>
> >> * Mosharaf Chowdhury <[email protected]>
> >> * Charles Reiss <[email protected]>
> >> * Andy Konwinski <[email protected]>
> >> * Patrick Wendell <[email protected]>
> >> * Imran Rashid <[email protected]>
> >> * Ryan LeCompte <[email protected]>
> >> * Ravi Pandya <[email protected]>
> >> * Ram Sriharsha <[email protected]>
> >> * Robert Evans <[email protected]>
> >> * Mridul Muralidharan <[email protected]>
> >> * Thomas Dudziak <[email protected]>
> >> * Mark Hamstra <[email protected]>
> >> * Stephen Haberman <[email protected]>
> >> * Jason Dai <[email protected]>
> >> * Shane Huang <[email protected]>
> >> * Andrew xia <[email protected]>
> >> * Nick Pentreath <[email protected]>
> >> * Sean McNamara <[email protected]>
> >>
> >> == Affiliations ==
> >> The initial committers are from nine organizations: UC Berkeley,
> >> Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Mxit and
> >> Webtrends.
> >>
> >> * Matei Zaharia (UCB)
> >> * Ankur Dave (UCB)
> >> * Tathagata Das (UCB)
> >> * Haoyuan Li (UCB)
> >> * Josh Rosen (UCB)
> >> * Reynold Xin (UCB)
> >> * Shivaram Venkataraman (UCB)
> >> * Mosharaf Chowdhury (UCB)
> >> * Charles Reiss (UCB)
> >> * Andy Konwinski (UCB)
> >> * Patrick Wendell (UCB)
> >> * Imran Rashid (Quantifind)
> >> * Ryan LeCompte (Quantifind)
> >> * Ravi Pandya (Microsoft)
> >> * Ram Sriharsha (Yahoo!)
> >> * Robert Evans (Yahoo!)
> >> * Mridul Muralidharam (Yahoo!)
> >> * Thomas Dudziak (ClearStory)
> >> * Mark Hamstra (ClearStory)
> >> * Stephen Haberman (Bizo)
> >> * Jason Dai (Intel)
> >> * Shane Huang (Intel)
> >> * Andrew Xia (Intel)
> >> * Nick Pentreath (Mxit)
> >> * Sean McNamara (Webtrends)
> >>
> >> == Sponsors ==
> >> === Champion ===
> >> * Chris Mattmann
> >>
> >> === Nominated Mentors ===
> >> * Chris Mattmann
> >> * Paul Ramirez
> >> * Andrew Hart
> >> * Thomas Dudziak
> >> * Suresh Marru
> >> * Henry Saputra
> >>
> >> === Sponsoring Entity ===
> >> The Apache Incubator
> >>
> >>
> >>
> >>
> >>
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Chris Mattmann, Ph.D.
> >> Senior Computer Scientist
> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> Office: 171-266B, Mailstop: 171-246
> >> Email: [email protected]
> >> WWW:  http://sunset.usc.edu/~mattmann/
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Adjunct Assistant Professor, Computer Science Department
> >> University of Southern California, Los Angeles, CA 90089 USA
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [VOTE] Apache Spark for the Incubator

Reply via email to