+1 (non-binding) Andy
On Sat, Jun 8, 2013 at 12:36 AM, Matei Zaharia <ma...@eecs.berkeley.edu>wrote: > +1 (non-binding) > > Matei > > On Jun 8, 2013, at 12:25 AM, Hitesh Shah <hit...@hortonworks.com> wrote: > > > +1 (non-binding) > > > > -- Hitesh > > > > On Jun 7, 2013, at 10:34 PM, Mattmann, Chris A (398J) wrote: > > > >> Hi Folks, > >> > >> OK discussion has died down, time to VOTE to accept Spark into the > >> Apache Incubator. I'll let the VOTE run for at least a week. > >> > >> So far I've heard +1s from the following folks, so no need for them > >> to VOTE again unless they want to change their VOTE: > >> > >> +1 > >> > >> Chris Mattmann* > >> Konstantin Boudnik > >> Henry Saputra* > >> Reynold Xin > >> Pei Chen > >> Roman Shaposhnik* > >> Suresh Marru* > >> > >> * -indicates IPMC > >> > >> [ ] +1 Accept Spark into the Apache Incubator. > >> [ ] +0 Don't care. > >> [ ] -1 Don't accept Spark into the Apache Incubator because.. > >> > >> Proposal text is below. > >> > >> === Abstract === > >> Spark is an open source system for large-scale data analysis on > clusters. > >> > >> === Proposal === > >> Spark is an open source system for fast and flexible large-scale data > >> analysis. Spark provides a general purpose runtime that supports > >> low-latency execution in several forms. These include interactive > >> exploration of very large datasets, near real-time stream processing, > and > >> ad-hoc SQL analytics (through higher layer extensions). Spark interfaces > >> with HDFS, HBase, Cassandra and several other storage storage layers, > and > >> exposes APIs in Scala, Java and Python. > >> Background > >> Spark started as U.C. Berkeley research project, designed to efficiently > >> run machine learning algorithms on large datasets. Over time, it has > >> evolved into a general computing engine as outlined above. Spark¹s > >> developer community has also grown to include additional institutions, > >> such as universities, research labs, and corporations. Funding has been > >> provided by various institutions including the U.S. National Science > >> Foundation, DARPA, and a number of industry sponsors. See: > >> https://amplab.cs.berkeley.edu/sponsors/ for full details. > >> > >> === Rationale === > >> As the number of contributors to Spark has grown, we have sought for a > >> long-term home for the project, and we believe the Apache foundation > would > >> be a great fit. Spark is a natural fit for the Apache foundation: Spark > >> already interoperates with several existing Apache projects (HDFS, > HBase, > >> Hive, Cassandra, Avro and Flume to name a few). The Spark team is > familiar > >> with the Apache process and and subscribes to the Apache mission - the > >> team includes multiple Apache committers already. Finally, joining > Apache > >> will help coordinate the development effort of the growing number of > >> organizations which contribute to Spark. > >> > >> == Initial Goals == > >> The initial goals will most likely be to move the existing codebase to > >> Apache and integrate with the Apache development process. Furthermore, > we > >> plan for incremental development, and releases along with the Apache > >> guidelines. > >> > >> === Current Status === > >> == Meritocracy == > >> The Spark project already operates on meritocratic principles. Today, > >> Spark has several developers and has accepted multiple major patches > from > >> outside of U.C. Berkeley. While this process has remained mostly > informal > >> (we do not have an official committer list), an implicit organization > >> exists in which individuals who contribute major components act as > >> maintainers for those modules. If accepted, the Spark project would > >> include several of these participants as committers from the onset. We > >> will work to identify all committers and PPMC members for the project > and > >> to operate under the ASF meritocratic principles. > >> > >> === Community === > >> Acceptance into the Apache foundation would bolster the already strong > >> user and developer community around Spark. That community includes > dozens > >> of contributors from several institutions, a meetup group with several > >> hundred members, and an active mailing list composed of hundreds of > users. > >> Core Developers > >> The core developers of our project are listed in our contributors and > >> initial PPMC below. Though many exist at UC Berkeley, there is a > >> representative cross sampling of other organizations including > Quantifind, > >> Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends. > >> > >> > >> === Alignment === > >> Our proposed effort aligns with several ongoing BIGDATA and U.S. > National > >> priority funding interests including the NSF and its Expeditions > program, > >> and the DARPA XDATA project. Our industry partners and collaborators are > >> well aligned with our code base. > >> > >> There are also a number of related Apache projects and dependencies, > that > >> will be mentioned in the Relationships with Other Apache products > section. > >> > >> == Known Risks == > >> > >> === Orphaned Products === > >> Given the current level of investment in Spark - the risk of the project > >> being abandoned is minimal. There are several constituents who are > highly > >> incentivized to continue development. The U.C. Berkeley AMPLab relies on > >> Spark as a platform for a large number of long-term research projects. > >> Several companies have build verticalized products which are tightly > >> dependent on Spark. Other companies have devoted significant internal > >> infrastructure investment in Spark. > >> > >> === Inexperience with Open Source === > >> Spark has existed as a healthy open source project for several years. > >> During that time, Matei and others have curated an open-source community > >> successfully, attracting developers from a diverse group of companies > >> including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, > and > >> Webtrends. > >> > >> === Homogenous Developers === > >> The initial list of committers includes developers from several > >> institutions, including Quantifind, Microsoft, Yahoo!, ClearStory Data, > >> Bizo, Intel, and Webtrends. > >> > >> === Reliance on Salaried Developers === > >> Like most open source projects, Spark receives a substantial support > from > >> salaried developers. A large fraction of Spark development is supported > by > >> graduate students at U.C. Berkeley in the course of research degrees - > >> this is more a ³volunteer² relationship, since in most cases students > >> contribute vastly more than is necessary to immediately support > research. > >> In addition, those working from within corporations often devote ³after > >> hours² or spare time in the project - and these come from several > >> organizations. We will work to ensure that the ability for the project > to > >> continuously be stewarded and to proceed forward independent of salaried > >> developers is continued. > >> > >> > >> === Relationship with Other Apache Products === > >> Spark inter-operates with several existing Apache products by supporting > >> them as storage layers: Apache Cassandra, Apache HBase, and Apache > Hadoop > >> (HDFS). It also uses several Apache components internally including > Apache > >> Maven and several Apache Commons libraries. Finally, Shark (a higher > layer > >> framework built on Spark) inter-operates with Apache Hive. We will > explore > >> the relationship between Spark and Apache Gora, which also provides > >> in-memory object storage (Champion Mattmann was the Champion for Apace > >> Gora so we expect alignment and cross pollination between our efforts). > >> > >> Spark offers an alternative computation engine to Apache Hadoop > >> (MapReduce). Unlike MapReduce, Spark is designed for lower-latency and > >> interactive workloads. This makes the projects complimentary: many users > >> run MapReduce and Spark side-by-side. > >> > >> === A Excessive Fascination with the Apache Brand === > >> Spark is already a healthy and relatively well known open source > project. > >> This proposal is not for the purpose of generating publicity. Rather, > the > >> primary benefits to joining Apache are those outlined in the Rationale > >> section. > >> > >> === Documentation === > >> The reader will find these websites highly relevant: > >> * Spark website: http://spark-project.org/ > >> * Spark documentation: http://spark-project.org/documentation/ > >> * Issue tracking: https://spark-project.atlassian.net/ > >> * Codebase: https://github.com/mesos/spark > >> * User group: https://groups.google.com/group/spark-users > >> > >> == Initial Source == > >> The Spark codebase is currently hosted on Github: > >> https://github.com/mesos/spark. This is the exact codebase that we > would > >> migrate to the Apache foundation. > >> Source and Intellectual Property Submission Plan > >> Currently, the Spark codebase is distributed under a BSD license. The > vast > >> majority of code has copyright held by the University of California. > Upon > >> entering Apache, Spark will migrate to an Apache License with all > >> copyright assigned to the Apache Foundation. The University of > California > >> will transfer all copyright to the Apache Foundation. In certain cases > >> where individuals hold copyright, we will have individuals sign over > >> copyright to the Apache foundation as well. > >> > >> Going forward, all commits would assign copyright directly to the Apache > >> foundation through our signed Individual Contributor License Agreements > >> for all initial committers on the project. > >> > >> > >> == External Dependencies == > >> To the best of our knowledge, all dependencies of Spark are distributed > >> under Apache compatible licenses. Upon acceptance to the incubator, we > >> would begin a thorough analysis of all transitive dependencies to verify > >> this fact and introduce license checking into the build and release > >> process (for instance integrating Apache Rat). > >> > >> == Required Resources == > >> === Mailing list === > >> We will migrate the existing Spark mailing lists as follows: > >> > >> * spark-users@googlegroups --> us...@spark.incubator.apache.org > >> * spark-developers@googlegroups --> d...@spark.incubator.apache.org > >> * spark-commits are hosted on Github, so we would request > >> comm...@spark.incubator.apache.org > >> > >> The latter is to be consistent with the new PIAO naming scheme for > >> podlings. > >> > >> === Source control === > >> The Spark team would like to use Git for source control, due to our > >> current use of Git. > >> We request a writeable Git repo for Spark, and mirroring to be set up to > >> Github through INFRA. Champion Mattmann can assist with creating INFRA > >> tickets for this. > >> > >> === Issue Tracking === > >> Spark currently uses a hosted JIRA deployment for issue tracking. We > will > >> migrate to the Apache JIRA. > >> http://issues.apache.org/jira/browse/SPARK > >> > >> == Initial Committers == > >> * Matei Zaharia <ma...@apache.org> > >> * Ankur Dave <ankurd...@gmail.com> > >> * Tathagata Das <t...@eecs.berkeley.edu> > >> * Haoyuan Li <haoy...@cs.berkeley.edu> > >> * Josh Rosen <joshro...@cs.berkeley.edu> > >> * Reynold Xin <r...@cs.berkeley.edu> > >> * Shivaram Venkataraman <shiva...@eecs.berkeley.edu> > >> * Mosharaf Chowdhury <mosha...@cs.berkeley.edu> > >> * Charles Reiss <char...@eecs.berkeley.edu> > >> * Andy Konwinski <andykonwin...@gmail.com> > >> * Patrick Wendell <pwend...@eecs.berkeley.edu> > >> * Imran Rashid <im...@quantifind.com> > >> * Ryan LeCompte <lecom...@gmail.com> > >> * Ravi Pandya <ra...@exchange.microsoft.com> > >> * Ram Sriharsha <harsh...@yahoo-inc.com> > >> * Robert Evans <ev...@yahoo-inc.com> > >> * Mridul Muralidharan <mrid...@yahoo-inc.com> > >> * Thomas Dudziak <to...@clearstorydata.com> > >> * Mark Hamstra <m...@clearstorydata.com> > >> * Stephen Haberman <stephen.haber...@gmail.com> > >> * Jason Dai <jason....@intel.com> > >> * Shane Huang <shannie.hu...@gmail.com> > >> * Andrew xia <xiajunl...@gmail.com> > >> * Nick Pentreath <nick.pentre...@gmail.com> > >> * Sean McNamara <sean.mcnam...@webtrends.com> > >> > >> == Affiliations == > >> The initial committers are from nine organizations: UC Berkeley, > >> Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Mxit and > >> Webtrends. > >> > >> * Matei Zaharia (UCB) > >> * Ankur Dave (UCB) > >> * Tathagata Das (UCB) > >> * Haoyuan Li (UCB) > >> * Josh Rosen (UCB) > >> * Reynold Xin (UCB) > >> * Shivaram Venkataraman (UCB) > >> * Mosharaf Chowdhury (UCB) > >> * Charles Reiss (UCB) > >> * Andy Konwinski (UCB) > >> * Patrick Wendell (UCB) > >> * Imran Rashid (Quantifind) > >> * Ryan LeCompte (Quantifind) > >> * Ravi Pandya (Microsoft) > >> * Ram Sriharsha (Yahoo!) > >> * Robert Evans (Yahoo!) > >> * Mridul Muralidharam (Yahoo!) > >> * Thomas Dudziak (ClearStory) > >> * Mark Hamstra (ClearStory) > >> * Stephen Haberman (Bizo) > >> * Jason Dai (Intel) > >> * Shane Huang (Intel) > >> * Andrew Xia (Intel) > >> * Nick Pentreath (Mxit) > >> * Sean McNamara (Webtrends) > >> > >> == Sponsors == > >> === Champion === > >> * Chris Mattmann > >> > >> === Nominated Mentors === > >> * Chris Mattmann > >> * Paul Ramirez > >> * Andrew Hart > >> * Thomas Dudziak > >> * Suresh Marru > >> * Henry Saputra > >> > >> === Sponsoring Entity === > >> The Apache Incubator > >> > >> > >> > >> > >> > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> Chris Mattmann, Ph.D. > >> Senior Computer Scientist > >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> Office: 171-266B, Mailstop: 171-246 > >> Email: chris.a.mattm...@nasa.gov > >> WWW: http://sunset.usc.edu/~mattmann/ > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> Adjunct Assistant Professor, Computer Science Department > >> University of Southern California, Los Angeles, CA 90089 USA > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > >> For additional commands, e-mail: general-h...@incubator.apache.org > >> > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > For additional commands, e-mail: general-h...@incubator.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >