Re: [VOTE] Apache Spark for the Incubator

Thilina Gunarathne Sat, 08 Jun 2013 20:00:26 -0700

+1 (non binding)...

This is great news!.


thanks,
Thilina



On Sat, Jun 8, 2013 at 10:50 PM, Alan Cabrera <l...@toolazydogs.com> wrote:

> +1 binding
>
>
> Regards,
> Alan
>
> On Jun 7, 2013, at 10:34 PM, "Mattmann, Chris A (398J)" <
> chris.a.mattm...@jpl.nasa.gov> wrote:
>
> > Hi Folks,
> >
> > OK discussion has died down, time to VOTE to accept Spark into the
> > Apache Incubator. I'll let the VOTE run for at least a week.
> >
> > So far I've heard +1s from the following folks, so no need for them
> > to VOTE again unless they want to change their VOTE:
> >
> > +1
> >
> > Chris Mattmann*
> > Konstantin Boudnik
> > Henry Saputra*
> > Reynold Xin
> > Pei Chen
> > Roman Shaposhnik*
> > Suresh Marru*
> >
> > * -indicates IPMC
> >
> > [ ] +1 Accept Spark into the Apache Incubator.
> > [ ] +0 Don't care.
> > [ ] -1 Don't accept Spark into the Apache Incubator because..
> >
> > Proposal text is below.
> >
> > === Abstract ===
> > Spark is an open source system for large-scale data analysis on clusters.
> >
> > === Proposal ===
> > Spark is an open source system for fast and flexible large-scale data
> > analysis. Spark provides a general purpose runtime that supports
> > low-latency execution in several forms. These include interactive
> > exploration of very large datasets, near real-time stream processing, and
> > ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
> > with HDFS, HBase, Cassandra and several other storage storage layers, and
> > exposes APIs in Scala, Java and Python.
> > Background
> > Spark started as U.C. Berkeley research project, designed to efficiently
> > run machine learning algorithms on large datasets. Over time, it has
> > evolved into a general computing engine as outlined above. Spark¹s
> > developer community has also grown to include additional institutions,
> > such as universities, research labs, and corporations. Funding has been
> > provided by various institutions including the U.S. National Science
> > Foundation, DARPA, and a number of industry sponsors. See:
> > https://amplab.cs.berkeley.edu/sponsors/ for full details.
> >
> > === Rationale ===
> > As the number of contributors to Spark has grown, we have sought for a
> > long-term home for the project, and we believe the Apache foundation
> would
> > be a great fit. Spark is a natural fit for the Apache foundation: Spark
> > already interoperates with several existing Apache projects (HDFS, HBase,
> > Hive, Cassandra, Avro and Flume to name a few). The Spark team is
> familiar
> > with the Apache process and and subscribes to the Apache mission - the
> > team includes multiple Apache committers already. Finally, joining Apache
> > will help coordinate the development effort of the growing number of
> > organizations which contribute to Spark.
> >
> > == Initial Goals ==
> > The initial goals will most likely be to move the existing codebase to
> > Apache and integrate with the Apache development process. Furthermore, we
> > plan for incremental development, and releases along with the Apache
> > guidelines.
> >
> > === Current Status ===
> > == Meritocracy ==
> > The Spark project already operates on meritocratic principles. Today,
> > Spark has several developers and has accepted multiple major patches from
> > outside of U.C. Berkeley. While this process has remained mostly informal
> > (we do not have an official committer list), an implicit organization
> > exists in which individuals who contribute major components act as
> > maintainers for those modules. If accepted, the Spark project would
> > include several of these participants as committers from the onset. We
> > will work to identify all committers and PPMC members for the project and
> > to operate under the ASF meritocratic principles.
> >
> > === Community ===
> > Acceptance into the Apache foundation would bolster the already strong
> > user and developer community around Spark. That community includes dozens
> > of contributors from several institutions, a meetup group with several
> > hundred members, and an active mailing list composed of hundreds of
> users.
> > Core Developers
> > The core developers of our project are listed in our contributors and
> > initial PPMC below. Though many exist at UC Berkeley, there is a
> > representative cross sampling of other organizations including
> Quantifind,
> > Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends.
> >
> >
> > === Alignment ===
> > Our proposed effort aligns with several ongoing BIGDATA and U.S. National
> > priority funding interests including the NSF and its Expeditions program,
> > and the DARPA XDATA project. Our industry partners and collaborators are
> > well aligned with our code base.
> >
> > There are also a number of related Apache projects and dependencies, that
> > will be mentioned in the Relationships with Other Apache products
> section.
> >
> > == Known Risks ==
> >
> > === Orphaned Products ===
> > Given the current level of investment in Spark - the risk of the project
> > being abandoned is minimal. There are several constituents who are highly
> > incentivized to continue development. The U.C. Berkeley AMPLab relies on
> > Spark as a platform for a large number of long-term research projects.
> > Several companies have build verticalized products which are tightly
> > dependent on Spark. Other companies have devoted significant internal
> > infrastructure investment in Spark.
> >
> > === Inexperience with Open Source ===
> > Spark has existed as a healthy open source project for several years.
> > During that time, Matei and others have curated an open-source community
> > successfully, attracting developers from a diverse group of companies
> > including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel,
> and
> > Webtrends.
> >
> > === Homogenous Developers ===
> > The initial list of committers includes developers from several
> > institutions, including Quantifind, Microsoft, Yahoo!, ClearStory Data,
> > Bizo, Intel, and Webtrends.
> >
> > === Reliance on Salaried Developers ===
> > Like most open source projects, Spark receives a substantial support from
> > salaried developers. A large fraction of Spark development is supported
> by
> > graduate students at U.C. Berkeley in the course of research degrees -
> > this is more a ³volunteer² relationship, since in most cases students
> > contribute vastly more than is necessary to immediately support research.
> > In addition, those working from within corporations often devote ³after
> > hours² or spare time in the project - and these come from several
> > organizations. We will work to ensure that the ability for the project to
> > continuously be stewarded and to proceed forward independent of salaried
> > developers is continued.
> >
> >
> > === Relationship with Other Apache Products ===
> > Spark inter-operates with several existing Apache products by supporting
> > them as storage layers: Apache Cassandra, Apache HBase, and Apache Hadoop
> > (HDFS). It also uses several Apache components internally including
> Apache
> > Maven and several Apache Commons libraries. Finally, Shark (a higher
> layer
> > framework built on Spark) inter-operates with Apache Hive. We will
> explore
> > the relationship between Spark and Apache Gora, which also provides
> > in-memory object storage (Champion Mattmann was the Champion for Apace
> > Gora so we expect alignment and cross pollination between our efforts).
> >
> > Spark offers an alternative computation engine to Apache Hadoop
> > (MapReduce). Unlike MapReduce, Spark is designed for lower-latency and
> > interactive workloads. This makes the projects complimentary: many users
> > run MapReduce and Spark side-by-side.
> >
> > === A Excessive Fascination with the Apache Brand ===
> > Spark is already a healthy and relatively well known open source project.
> > This proposal is not for the purpose of generating publicity. Rather, the
> > primary benefits to joining Apache are those outlined in the Rationale
> > section.
> >
> > === Documentation ===
> > The reader will find these websites highly relevant:
> > * Spark website: http://spark-project.org/
> > * Spark documentation: http://spark-project.org/documentation/
> > * Issue tracking: https://spark-project.atlassian.net/
> > * Codebase: https://github.com/mesos/spark
> > * User group: https://groups.google.com/group/spark-users
> >
> > == Initial Source ==
> > The Spark codebase is currently hosted on Github:
> > https://github.com/mesos/spark. This is the exact codebase that we would
> > migrate to the Apache foundation.
> > Source and Intellectual Property Submission Plan
> > Currently, the Spark codebase is distributed under a BSD license. The
> vast
> > majority of code has copyright held by the University of California. Upon
> > entering Apache, Spark will migrate to an Apache License with all
> > copyright assigned to the Apache Foundation. The University of California
> > will transfer all copyright to the Apache Foundation. In certain cases
> > where individuals hold copyright, we will have individuals sign over
> > copyright to the Apache foundation as well.
> >
> > Going forward, all commits would assign copyright directly to the Apache
> > foundation through our signed Individual Contributor License Agreements
> > for all initial committers on the project.
> >
> >
> > == External Dependencies ==
> > To the best of our knowledge, all dependencies of Spark are distributed
> > under Apache compatible licenses. Upon acceptance to the incubator, we
> > would begin a thorough analysis of all transitive dependencies to verify
> > this fact and introduce license checking into the build and release
> > process (for instance integrating Apache Rat).
> >
> > == Required Resources ==
> > === Mailing list ===
> > We will migrate the existing Spark mailing lists as follows:
> >
> > * spark-users@googlegroups --> us...@spark.incubator.apache.org
> > * spark-developers@googlegroups --> d...@spark.incubator.apache.org
> > * spark-commits are hosted on Github, so we would request
> > comm...@spark.incubator.apache.org
> >
> > The latter is to be consistent with the new PIAO naming scheme for
> > podlings.
> >
> > === Source control ===
> > The Spark team would like to use Git for source control, due to our
> > current use of Git.
> > We request a writeable Git repo for Spark, and mirroring to be set up to
> > Github through INFRA. Champion Mattmann can assist with creating INFRA
> > tickets for this.
> >
> > === Issue Tracking ===
> > Spark currently uses a hosted JIRA deployment for issue tracking. We will
> > migrate to the Apache JIRA.
> > http://issues.apache.org/jira/browse/SPARK
> >
> > == Initial Committers ==
> > * Matei Zaharia <ma...@apache.org>
> > * Ankur Dave <ankurd...@gmail.com>
> > * Tathagata Das <t...@eecs.berkeley.edu>
> > * Haoyuan Li <haoy...@cs.berkeley.edu>
> > * Josh Rosen <joshro...@cs.berkeley.edu>
> > * Reynold Xin <r...@cs.berkeley.edu>
> > * Shivaram Venkataraman <shiva...@eecs.berkeley.edu>
> > * Mosharaf Chowdhury <mosha...@cs.berkeley.edu>
> > * Charles Reiss <char...@eecs.berkeley.edu>
> > * Andy Konwinski <andykonwin...@gmail.com>
> > * Patrick Wendell <pwend...@eecs.berkeley.edu>
> > * Imran Rashid <im...@quantifind.com>
> > * Ryan LeCompte <lecom...@gmail.com>
> > * Ravi Pandya <ra...@exchange.microsoft.com>
> > * Ram Sriharsha <harsh...@yahoo-inc.com>
> > * Robert Evans <ev...@yahoo-inc.com>
> > * Mridul Muralidharan <mrid...@yahoo-inc.com>
> > * Thomas Dudziak <to...@clearstorydata.com>
> > * Mark Hamstra <m...@clearstorydata.com>
> > * Stephen Haberman <stephen.haber...@gmail.com>
> > * Jason Dai <jason....@intel.com>
> > * Shane Huang <shannie.hu...@gmail.com>
> > * Andrew xia <xiajunl...@gmail.com>
> > * Nick Pentreath <nick.pentre...@gmail.com>
> > * Sean McNamara <sean.mcnam...@webtrends.com>
> >
> > == Affiliations ==
> > The initial committers are from nine organizations: UC Berkeley,
> > Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Mxit and
> > Webtrends.
> >
> > * Matei Zaharia (UCB)
> > * Ankur Dave (UCB)
> > * Tathagata Das (UCB)
> > * Haoyuan Li (UCB)
> > * Josh Rosen (UCB)
> > * Reynold Xin (UCB)
> > * Shivaram Venkataraman (UCB)
> > * Mosharaf Chowdhury (UCB)
> > * Charles Reiss (UCB)
> > * Andy Konwinski (UCB)
> > * Patrick Wendell (UCB)
> > * Imran Rashid (Quantifind)
> > * Ryan LeCompte (Quantifind)
> > * Ravi Pandya (Microsoft)
> > * Ram Sriharsha (Yahoo!)
> > * Robert Evans (Yahoo!)
> > * Mridul Muralidharam (Yahoo!)
> > * Thomas Dudziak (ClearStory)
> > * Mark Hamstra (ClearStory)
> > * Stephen Haberman (Bizo)
> > * Jason Dai (Intel)
> > * Shane Huang (Intel)
> > * Andrew Xia (Intel)
> > * Nick Pentreath (Mxit)
> > * Sean McNamara (Webtrends)
> >
> > == Sponsors ==
> > === Champion ===
> > * Chris Mattmann
> >
> > === Nominated Mentors ===
> > * Chris Mattmann
> > * Paul Ramirez
> > * Andrew Hart
> > * Thomas Dudziak
> > * Suresh Marru
> > * Henry Saputra
> >
> > === Sponsoring Entity ===
> > The Apache Incubator
> >
> >
> >
> >
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Chris Mattmann, Ph.D.
> > Senior Computer Scientist
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 171-266B, Mailstop: 171-246
> > Email: chris.a.mattm...@nasa.gov
> > WWW:  http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Adjunct Assistant Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


-- 
https://www.cs.indiana.edu/~tgunarat/
http://www.linkedin.com/in/thilina
http://thilina.gunarathne.org

Re: [VOTE] Apache Spark for the Incubator

Reply via email to