+1 On 6/7/13 10:34 PM, "Mattmann, Chris A (398J)" <chris.a.mattm...@jpl.nasa.gov> wrote:
>Hi Folks, > >OK discussion has died down, time to VOTE to accept Spark into the >Apache Incubator. I'll let the VOTE run for at least a week. > >So far I've heard +1s from the following folks, so no need for them >to VOTE again unless they want to change their VOTE: > >+1 > >Chris Mattmann* >Konstantin Boudnik >Henry Saputra* >Reynold Xin >Pei Chen >Roman Shaposhnik* >Suresh Marru* > >* -indicates IPMC > >[ ] +1 Accept Spark into the Apache Incubator. >[ ] +0 Don't care. >[ ] -1 Don't accept Spark into the Apache Incubator because.. > >Proposal text is below. > >=== Abstract === >Spark is an open source system for large-scale data analysis on clusters. > >=== Proposal === >Spark is an open source system for fast and flexible large-scale data >analysis. Spark provides a general purpose runtime that supports >low-latency execution in several forms. These include interactive >exploration of very large datasets, near real-time stream processing, and >ad-hoc SQL analytics (through higher layer extensions). Spark interfaces >with HDFS, HBase, Cassandra and several other storage storage layers, and >exposes APIs in Scala, Java and Python. >Background >Spark started as U.C. Berkeley research project, designed to efficiently >run machine learning algorithms on large datasets. Over time, it has >evolved into a general computing engine as outlined above. Spark¹s >developer community has also grown to include additional institutions, >such as universities, research labs, and corporations. Funding has been >provided by various institutions including the U.S. National Science >Foundation, DARPA, and a number of industry sponsors. See: >https://amplab.cs.berkeley.edu/sponsors/ for full details. > >=== Rationale === >As the number of contributors to Spark has grown, we have sought for a >long-term home for the project, and we believe the Apache foundation would >be a great fit. Spark is a natural fit for the Apache foundation: Spark >already interoperates with several existing Apache projects (HDFS, HBase, >Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar >with the Apache process and and subscribes to the Apache mission - the >team includes multiple Apache committers already. Finally, joining Apache >will help coordinate the development effort of the growing number of >organizations which contribute to Spark. > >== Initial Goals == >The initial goals will most likely be to move the existing codebase to >Apache and integrate with the Apache development process. Furthermore, we >plan for incremental development, and releases along with the Apache >guidelines. > >=== Current Status === >== Meritocracy == >The Spark project already operates on meritocratic principles. Today, >Spark has several developers and has accepted multiple major patches from >outside of U.C. Berkeley. While this process has remained mostly informal >(we do not have an official committer list), an implicit organization >exists in which individuals who contribute major components act as >maintainers for those modules. If accepted, the Spark project would >include several of these participants as committers from the onset. We >will work to identify all committers and PPMC members for the project and >to operate under the ASF meritocratic principles. > >=== Community === >Acceptance into the Apache foundation would bolster the already strong >user and developer community around Spark. That community includes dozens >of contributors from several institutions, a meetup group with several >hundred members, and an active mailing list composed of hundreds of users. >Core Developers >The core developers of our project are listed in our contributors and >initial PPMC below. Though many exist at UC Berkeley, there is a >representative cross sampling of other organizations including Quantifind, >Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends. > > >=== Alignment === >Our proposed effort aligns with several ongoing BIGDATA and U.S. National >priority funding interests including the NSF and its Expeditions program, >and the DARPA XDATA project. Our industry partners and collaborators are >well aligned with our code base. > >There are also a number of related Apache projects and dependencies, that >will be mentioned in the Relationships with Other Apache products section. > >== Known Risks == > >=== Orphaned Products === >Given the current level of investment in Spark - the risk of the project >being abandoned is minimal. There are several constituents who are highly >incentivized to continue development. The U.C. Berkeley AMPLab relies on >Spark as a platform for a large number of long-term research projects. >Several companies have build verticalized products which are tightly >dependent on Spark. Other companies have devoted significant internal >infrastructure investment in Spark. > >=== Inexperience with Open Source === >Spark has existed as a healthy open source project for several years. >During that time, Matei and others have curated an open-source community >successfully, attracting developers from a diverse group of companies >including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, and >Webtrends. > >=== Homogenous Developers === >The initial list of committers includes developers from several >institutions, including Quantifind, Microsoft, Yahoo!, ClearStory Data, >Bizo, Intel, and Webtrends. > >=== Reliance on Salaried Developers === >Like most open source projects, Spark receives a substantial support from >salaried developers. A large fraction of Spark development is supported by >graduate students at U.C. Berkeley in the course of research degrees - >this is more a ³volunteer² relationship, since in most cases students >contribute vastly more than is necessary to immediately support research. >In addition, those working from within corporations often devote ³after >hours² or spare time in the project - and these come from several >organizations. We will work to ensure that the ability for the project to >continuously be stewarded and to proceed forward independent of salaried >developers is continued. > > >=== Relationship with Other Apache Products === >Spark inter-operates with several existing Apache products by supporting >them as storage layers: Apache Cassandra, Apache HBase, and Apache Hadoop >(HDFS). It also uses several Apache components internally including Apache >Maven and several Apache Commons libraries. Finally, Shark (a higher layer >framework built on Spark) inter-operates with Apache Hive. We will explore >the relationship between Spark and Apache Gora, which also provides >in-memory object storage (Champion Mattmann was the Champion for Apace >Gora so we expect alignment and cross pollination between our efforts). > >Spark offers an alternative computation engine to Apache Hadoop >(MapReduce). Unlike MapReduce, Spark is designed for lower-latency and >interactive workloads. This makes the projects complimentary: many users >run MapReduce and Spark side-by-side. > >=== A Excessive Fascination with the Apache Brand === >Spark is already a healthy and relatively well known open source project. >This proposal is not for the purpose of generating publicity. Rather, the >primary benefits to joining Apache are those outlined in the Rationale >section. > >=== Documentation === >The reader will find these websites highly relevant: > * Spark website: http://spark-project.org/ > * Spark documentation: http://spark-project.org/documentation/ > * Issue tracking: https://spark-project.atlassian.net/ > * Codebase: https://github.com/mesos/spark > * User group: https://groups.google.com/group/spark-users > >== Initial Source == >The Spark codebase is currently hosted on Github: >https://github.com/mesos/spark. This is the exact codebase that we would >migrate to the Apache foundation. >Source and Intellectual Property Submission Plan >Currently, the Spark codebase is distributed under a BSD license. The vast >majority of code has copyright held by the University of California. Upon >entering Apache, Spark will migrate to an Apache License with all >copyright assigned to the Apache Foundation. The University of California >will transfer all copyright to the Apache Foundation. In certain cases >where individuals hold copyright, we will have individuals sign over >copyright to the Apache foundation as well. > >Going forward, all commits would assign copyright directly to the Apache >foundation through our signed Individual Contributor License Agreements >for all initial committers on the project. > > >== External Dependencies == >To the best of our knowledge, all dependencies of Spark are distributed >under Apache compatible licenses. Upon acceptance to the incubator, we >would begin a thorough analysis of all transitive dependencies to verify >this fact and introduce license checking into the build and release >process (for instance integrating Apache Rat). > >== Required Resources == >=== Mailing list === >We will migrate the existing Spark mailing lists as follows: > > * spark-users@googlegroups --> us...@spark.incubator.apache.org > * spark-developers@googlegroups --> d...@spark.incubator.apache.org > * spark-commits are hosted on Github, so we would request >comm...@spark.incubator.apache.org > >The latter is to be consistent with the new PIAO naming scheme for >podlings. > >=== Source control === >The Spark team would like to use Git for source control, due to our >current use of Git. >We request a writeable Git repo for Spark, and mirroring to be set up to >Github through INFRA. Champion Mattmann can assist with creating INFRA >tickets for this. > >=== Issue Tracking === >Spark currently uses a hosted JIRA deployment for issue tracking. We will >migrate to the Apache JIRA. >http://issues.apache.org/jira/browse/SPARK > >== Initial Committers == > * Matei Zaharia <ma...@apache.org> > * Ankur Dave <ankurd...@gmail.com> > * Tathagata Das <t...@eecs.berkeley.edu> > * Haoyuan Li <haoy...@cs.berkeley.edu> > * Josh Rosen <joshro...@cs.berkeley.edu> > * Reynold Xin <r...@cs.berkeley.edu> > * Shivaram Venkataraman <shiva...@eecs.berkeley.edu> > * Mosharaf Chowdhury <mosha...@cs.berkeley.edu> > * Charles Reiss <char...@eecs.berkeley.edu> > * Andy Konwinski <andykonwin...@gmail.com> > * Patrick Wendell <pwend...@eecs.berkeley.edu> > * Imran Rashid <im...@quantifind.com> > * Ryan LeCompte <lecom...@gmail.com> > * Ravi Pandya <ra...@exchange.microsoft.com> > * Ram Sriharsha <harsh...@yahoo-inc.com> > * Robert Evans <ev...@yahoo-inc.com> > * Mridul Muralidharan <mrid...@yahoo-inc.com> > * Thomas Dudziak <to...@clearstorydata.com> > * Mark Hamstra <m...@clearstorydata.com> > * Stephen Haberman <stephen.haber...@gmail.com> > * Jason Dai <jason....@intel.com> > * Shane Huang <shannie.hu...@gmail.com> > * Andrew xia <xiajunl...@gmail.com> > * Nick Pentreath <nick.pentre...@gmail.com> > * Sean McNamara <sean.mcnam...@webtrends.com> > >== Affiliations == >The initial committers are from nine organizations: UC Berkeley, >Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Mxit and >Webtrends. > > * Matei Zaharia (UCB) > * Ankur Dave (UCB) > * Tathagata Das (UCB) > * Haoyuan Li (UCB) > * Josh Rosen (UCB) > * Reynold Xin (UCB) > * Shivaram Venkataraman (UCB) > * Mosharaf Chowdhury (UCB) > * Charles Reiss (UCB) > * Andy Konwinski (UCB) > * Patrick Wendell (UCB) > * Imran Rashid (Quantifind) > * Ryan LeCompte (Quantifind) > * Ravi Pandya (Microsoft) > * Ram Sriharsha (Yahoo!) > * Robert Evans (Yahoo!) > * Mridul Muralidharam (Yahoo!) > * Thomas Dudziak (ClearStory) > * Mark Hamstra (ClearStory) > * Stephen Haberman (Bizo) > * Jason Dai (Intel) > * Shane Huang (Intel) > * Andrew Xia (Intel) > * Nick Pentreath (Mxit) > * Sean McNamara (Webtrends) > >== Sponsors == >=== Champion === > * Chris Mattmann > >=== Nominated Mentors === > * Chris Mattmann > * Paul Ramirez > * Andrew Hart > * Thomas Dudziak > * Suresh Marru > * Henry Saputra > >=== Sponsoring Entity === > The Apache Incubator > > > > > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >Chris Mattmann, Ph.D. >Senior Computer Scientist >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >Office: 171-266B, Mailstop: 171-246 >Email: chris.a.mattm...@nasa.gov >WWW: http://sunset.usc.edu/~mattmann/ >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >Adjunct Assistant Professor, Computer Science Department >University of Southern California, Los Angeles, CA 90089 USA >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >For additional commands, e-mail: general-h...@incubator.apache.org >