Re: [RESULT] [VOTE] Apache Spark for the Incubator

Matei Zaharia Thu, 20 Jun 2013 03:59:47 -0700

Thanks Chris! We'll get started on all the required steps.

Matei


On Jun 20, 2013, at 4:35 AM, "Mattmann, Chris A (398J)" 
<chris.a.mattm...@jpl.nasa.gov> wrote:

> Hi Folks,
> 
> This VOTE has passed with the following tallies:
> 
> +1
> Chris Mattmann*
> Konstantin Boudnik
> Henry Saputra*
> Reynold Xin
> Pei Chen
> Roman Shaposhnik*
> Suresh Marru*
> Scott Deboy
> Ted Dunning*
> Hitesh Shah
> Paul Ramirez*
> Ralph Goers*
> Alan Cabrera*
> Thilina Gunarathne
> Marcel Offermans*
> Alex Karasulu*
> Chris Douglas*
> Andrew Hart*
> Deepal jayasinghe 
> Ashish
> Joe Brockmeier*
> Mohammad Nour El-Din*
> Arun C Murthy*
> Tim Williams*
> Arvind Prabhakar*
> Matt Franklin*
> Matei Zaharia
> Andy Konwinski
> 
> +0.99999
> 
> 
> Marvin Humphrey
> 
> * -indicates IPMC
> 
> 
> I'll go ahead and get the JIRA tickets filed for email/issue tracking/Git,
> and then work with the community to get them moving on' over. Thanks for
> VOTE'ing!
> 
> Cheers,
> Chris
> 
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> 
> 
> 
> 
> 
> -----Original Message-----
> From: <Mattmann>, jpluser <chris.a.mattm...@jpl.nasa.gov>
> Reply-To: "general@incubator.apache.org" <general@incubator.apache.org>
> Date: Friday, June 7, 2013 10:34 PM
> To: "general@incubator.apache.org" <general@incubator.apache.org>
> Subject: [VOTE] Apache Spark for the Incubator
> 
>> Hi Folks,
>> 
>> OK discussion has died down, time to VOTE to accept Spark into the
>> Apache Incubator. I'll let the VOTE run for at least a week.
>> 
>> So far I've heard +1s from the following folks, so no need for them
>> to VOTE again unless they want to change their VOTE:
>> 
>> +1
>> 
>> Chris Mattmann*
>> Konstantin Boudnik
>> Henry Saputra*
>> Reynold Xin
>> Pei Chen
>> Roman Shaposhnik*
>> Suresh Marru*
>> 
>> * -indicates IPMC
>> 
>> [ ] +1 Accept Spark into the Apache Incubator.
>> [ ] +0 Don't care.
>> [ ] -1 Don't accept Spark into the Apache Incubator because..
>> 
>> Proposal text is below.
>> 
>> === Abstract ===
>> Spark is an open source system for large-scale data analysis on clusters.
>> 
>> === Proposal ===
>> Spark is an open source system for fast and flexible large-scale data
>> analysis. Spark provides a general purpose runtime that supports
>> low-latency execution in several forms. These include interactive
>> exploration of very large datasets, near real-time stream processing, and
>> ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
>> with HDFS, HBase, Cassandra and several other storage storage layers, and
>> exposes APIs in Scala, Java and Python.
>> Background
>> Spark started as U.C. Berkeley research project, designed to efficiently
>> run machine learning algorithms on large datasets. Over time, it has
>> evolved into a general computing engine as outlined above. Spark¹s
>> developer community has also grown to include additional institutions,
>> such as universities, research labs, and corporations. Funding has been
>> provided by various institutions including the U.S. National Science
>> Foundation, DARPA, and a number of industry sponsors. See:
>> https://amplab.cs.berkeley.edu/sponsors/ for full details.
>> 
>> === Rationale ===
>> As the number of contributors to Spark has grown, we have sought for a
>> long-term home for the project, and we believe the Apache foundation would
>> be a great fit. Spark is a natural fit for the Apache foundation: Spark
>> already interoperates with several existing Apache projects (HDFS, HBase,
>> Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar
>> with the Apache process and and subscribes to the Apache mission - the
>> team includes multiple Apache committers already. Finally, joining Apache
>> will help coordinate the development effort of the growing number of
>> organizations which contribute to Spark.
>> 
>> == Initial Goals ==
>> The initial goals will most likely be to move the existing codebase to
>> Apache and integrate with the Apache development process. Furthermore, we
>> plan for incremental development, and releases along with the Apache
>> guidelines.
>> 
>> === Current Status ===
>> == Meritocracy ==
>> The Spark project already operates on meritocratic principles. Today,
>> Spark has several developers and has accepted multiple major patches from
>> outside of U.C. Berkeley. While this process has remained mostly informal
>> (we do not have an official committer list), an implicit organization
>> exists in which individuals who contribute major components act as
>> maintainers for those modules. If accepted, the Spark project would
>> include several of these participants as committers from the onset. We
>> will work to identify all committers and PPMC members for the project and
>> to operate under the ASF meritocratic principles.
>> 
>> === Community ===
>> Acceptance into the Apache foundation would bolster the already strong
>> user and developer community around Spark. That community includes dozens
>> of contributors from several institutions, a meetup group with several
>> hundred members, and an active mailing list composed of hundreds of users.
>> Core Developers
>> The core developers of our project are listed in our contributors and
>> initial PPMC below. Though many exist at UC Berkeley, there is a
>> representative cross sampling of other organizations including Quantifind,
>> Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends.
>> 
>> 
>> === Alignment ===
>> Our proposed effort aligns with several ongoing BIGDATA and U.S. National
>> priority funding interests including the NSF and its Expeditions program,
>> and the DARPA XDATA project. Our industry partners and collaborators are
>> well aligned with our code base.
>> 
>> There are also a number of related Apache projects and dependencies, that
>> will be mentioned in the Relationships with Other Apache products section.
>> 
>> == Known Risks ==
>> 
>> === Orphaned Products ===
>> Given the current level of investment in Spark - the risk of the project
>> being abandoned is minimal. There are several constituents who are highly
>> incentivized to continue development. The U.C. Berkeley AMPLab relies on
>> Spark as a platform for a large number of long-term research projects.
>> Several companies have build verticalized products which are tightly
>> dependent on Spark. Other companies have devoted significant internal
>> infrastructure investment in Spark.
>> 
>> === Inexperience with Open Source ===
>> Spark has existed as a healthy open source project for several years.
>> During that time, Matei and others have curated an open-source community
>> successfully, attracting developers from a diverse group of companies
>> including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, and
>> Webtrends. 
>> 
>> === Homogenous Developers ===
>> The initial list of committers includes developers from several
>> institutions, including Quantifind, Microsoft, Yahoo!, ClearStory Data,
>> Bizo, Intel, and Webtrends.
>> 
>> === Reliance on Salaried Developers ===
>> Like most open source projects, Spark receives a substantial support from
>> salaried developers. A large fraction of Spark development is supported by
>> graduate students at U.C. Berkeley in the course of research degrees -
>> this is more a ³volunteer² relationship, since in most cases students
>> contribute vastly more than is necessary to immediately support research.
>> In addition, those working from within corporations often devote ³after
>> hours² or spare time in the project - and these come from several
>> organizations. We will work to ensure that the ability for the project to
>> continuously be stewarded and to proceed forward independent of salaried
>> developers is continued.
>> 
>> 
>> === Relationship with Other Apache Products ===
>> Spark inter-operates with several existing Apache products by supporting
>> them as storage layers: Apache Cassandra, Apache HBase, and Apache Hadoop
>> (HDFS). It also uses several Apache components internally including Apache
>> Maven and several Apache Commons libraries. Finally, Shark (a higher layer
>> framework built on Spark) inter-operates with Apache Hive. We will explore
>> the relationship between Spark and Apache Gora, which also provides
>> in-memory object storage (Champion Mattmann was the Champion for Apace
>> Gora so we expect alignment and cross pollination between our efforts).
>> 
>> Spark offers an alternative computation engine to Apache Hadoop
>> (MapReduce). Unlike MapReduce, Spark is designed for lower-latency and
>> interactive workloads. This makes the projects complimentary: many users
>> run MapReduce and Spark side-by-side.
>> 
>> === A Excessive Fascination with the Apache Brand ===
>> Spark is already a healthy and relatively well known open source project.
>> This proposal is not for the purpose of generating publicity. Rather, the
>> primary benefits to joining Apache are those outlined in the Rationale
>> section.
>> 
>> === Documentation ===
>> The reader will find these websites highly relevant:
>> * Spark website: http://spark-project.org/
>> * Spark documentation: http://spark-project.org/documentation/
>> * Issue tracking: https://spark-project.atlassian.net/
>> * Codebase: https://github.com/mesos/spark
>> * User group: https://groups.google.com/group/spark-users
>> 
>> == Initial Source ==
>> The Spark codebase is currently hosted on Github:
>> https://github.com/mesos/spark. This is the exact codebase that we would
>> migrate to the Apache foundation.
>> Source and Intellectual Property Submission Plan
>> Currently, the Spark codebase is distributed under a BSD license. The vast
>> majority of code has copyright held by the University of California. Upon
>> entering Apache, Spark will migrate to an Apache License with all
>> copyright assigned to the Apache Foundation. The University of California
>> will transfer all copyright to the Apache Foundation. In certain cases
>> where individuals hold copyright, we will have individuals sign over
>> copyright to the Apache foundation as well.
>> 
>> Going forward, all commits would assign copyright directly to the Apache
>> foundation through our signed Individual Contributor License Agreements
>> for all initial committers on the project.
>> 
>> 
>> == External Dependencies ==
>> To the best of our knowledge, all dependencies of Spark are distributed
>> under Apache compatible licenses. Upon acceptance to the incubator, we
>> would begin a thorough analysis of all transitive dependencies to verify
>> this fact and introduce license checking into the build and release
>> process (for instance integrating Apache Rat).
>> 
>> == Required Resources ==
>> === Mailing list ===
>> We will migrate the existing Spark mailing lists as follows:
>> 
>> * spark-users@googlegroups --> us...@spark.incubator.apache.org
>> * spark-developers@googlegroups --> d...@spark.incubator.apache.org
>> * spark-commits are hosted on Github, so we would request
>> comm...@spark.incubator.apache.org
>> 
>> The latter is to be consistent with the new PIAO naming scheme for
>> podlings.
>> 
>> === Source control ===
>> The Spark team would like to use Git for source control, due to our
>> current use of Git.
>> We request a writeable Git repo for Spark, and mirroring to be set up to
>> Github through INFRA. Champion Mattmann can assist with creating INFRA
>> tickets for this.
>> 
>> === Issue Tracking ===
>> Spark currently uses a hosted JIRA deployment for issue tracking. We will
>> migrate to the Apache JIRA.
>> http://issues.apache.org/jira/browse/SPARK
>> 
>> == Initial Committers ==
>> * Matei Zaharia <ma...@apache.org>
>> * Ankur Dave <ankurd...@gmail.com>
>> * Tathagata Das <t...@eecs.berkeley.edu>
>> * Haoyuan Li <haoy...@cs.berkeley.edu>
>> * Josh Rosen <joshro...@cs.berkeley.edu>
>> * Reynold Xin <r...@cs.berkeley.edu>
>> * Shivaram Venkataraman <shiva...@eecs.berkeley.edu>
>> * Mosharaf Chowdhury <mosha...@cs.berkeley.edu>
>> * Charles Reiss <char...@eecs.berkeley.edu>
>> * Andy Konwinski <andykonwin...@gmail.com>
>> * Patrick Wendell <pwend...@eecs.berkeley.edu>
>> * Imran Rashid <im...@quantifind.com>
>> * Ryan LeCompte <lecom...@gmail.com>
>> * Ravi Pandya <ra...@exchange.microsoft.com>
>> * Ram Sriharsha <harsh...@yahoo-inc.com>
>> * Robert Evans <ev...@yahoo-inc.com>
>> * Mridul Muralidharan <mrid...@yahoo-inc.com>
>> * Thomas Dudziak <to...@clearstorydata.com>
>> * Mark Hamstra <m...@clearstorydata.com>
>> * Stephen Haberman <stephen.haber...@gmail.com>
>> * Jason Dai <jason....@intel.com>
>> * Shane Huang <shannie.hu...@gmail.com>
>> * Andrew xia <xiajunl...@gmail.com>
>> * Nick Pentreath <nick.pentre...@gmail.com>
>> * Sean McNamara <sean.mcnam...@webtrends.com>
>> 
>> == Affiliations ==
>> The initial committers are from nine organizations: UC Berkeley,
>> Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Mxit and
>> Webtrends.
>> 
>> * Matei Zaharia (UCB)
>> * Ankur Dave (UCB)
>> * Tathagata Das (UCB)
>> * Haoyuan Li (UCB)
>> * Josh Rosen (UCB)
>> * Reynold Xin (UCB)
>> * Shivaram Venkataraman (UCB)
>> * Mosharaf Chowdhury (UCB)
>> * Charles Reiss (UCB)
>> * Andy Konwinski (UCB)
>> * Patrick Wendell (UCB)
>> * Imran Rashid (Quantifind)
>> * Ryan LeCompte (Quantifind)
>> * Ravi Pandya (Microsoft)
>> * Ram Sriharsha (Yahoo!)
>> * Robert Evans (Yahoo!)
>> * Mridul Muralidharam (Yahoo!)
>> * Thomas Dudziak (ClearStory)
>> * Mark Hamstra (ClearStory)
>> * Stephen Haberman (Bizo)
>> * Jason Dai (Intel)
>> * Shane Huang (Intel)
>> * Andrew Xia (Intel)
>> * Nick Pentreath (Mxit)
>> * Sean McNamara (Webtrends)
>> 
>> == Sponsors ==
>> === Champion ===
>> * Chris Mattmann
>> 
>> === Nominated Mentors ===
>> * Chris Mattmann
>> * Paul Ramirez 
>> * Andrew Hart 
>> * Thomas Dudziak 
>> * Suresh Marru
>> * Henry Saputra
>> 
>> === Sponsoring Entity ===
>> The Apache Incubator
>> 
>> 
>> 
>> 
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattm...@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>> 
> 
> 龜菉菉菉菉菉菉菉菉菉菉菉菉菉菉菉菉菉菉菉菉菉菉菉菉菉菉菉菉菉菉菉菉菉菉契FV7V'67&&RR蘆창vVW&쬈V7V'67&&T7V&F"6R捻&p圭f"FFF妥6寢G2R蘆창vVW&쫴V7V&F"6R捻&p


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: [RESULT] [VOTE] Apache Spark for the Incubator

Reply via email to