+1 (non binding)... This is great news!.
thanks, Thilina On Sat, Jun 8, 2013 at 10:50 PM, Alan Cabrera <l...@toolazydogs.com> wrote: > +1 binding > > > Regards, > Alan > > On Jun 7, 2013, at 10:34 PM, "Mattmann, Chris A (398J)" < > chris.a.mattm...@jpl.nasa.gov> wrote: > > > Hi Folks, > > > > OK discussion has died down, time to VOTE to accept Spark into the > > Apache Incubator. I'll let the VOTE run for at least a week. > > > > So far I've heard +1s from the following folks, so no need for them > > to VOTE again unless they want to change their VOTE: > > > > +1 > > > > Chris Mattmann* > > Konstantin Boudnik > > Henry Saputra* > > Reynold Xin > > Pei Chen > > Roman Shaposhnik* > > Suresh Marru* > > > > * -indicates IPMC > > > > [ ] +1 Accept Spark into the Apache Incubator. > > [ ] +0 Don't care. > > [ ] -1 Don't accept Spark into the Apache Incubator because.. > > > > Proposal text is below. > > > > === Abstract === > > Spark is an open source system for large-scale data analysis on clusters. > > > > === Proposal === > > Spark is an open source system for fast and flexible large-scale data > > analysis. Spark provides a general purpose runtime that supports > > low-latency execution in several forms. These include interactive > > exploration of very large datasets, near real-time stream processing, and > > ad-hoc SQL analytics (through higher layer extensions). Spark interfaces > > with HDFS, HBase, Cassandra and several other storage storage layers, and > > exposes APIs in Scala, Java and Python. > > Background > > Spark started as U.C. Berkeley research project, designed to efficiently > > run machine learning algorithms on large datasets. Over time, it has > > evolved into a general computing engine as outlined above. Spark¹s > > developer community has also grown to include additional institutions, > > such as universities, research labs, and corporations. Funding has been > > provided by various institutions including the U.S. National Science > > Foundation, DARPA, and a number of industry sponsors. See: > > https://amplab.cs.berkeley.edu/sponsors/ for full details. > > > > === Rationale === > > As the number of contributors to Spark has grown, we have sought for a > > long-term home for the project, and we believe the Apache foundation > would > > be a great fit. Spark is a natural fit for the Apache foundation: Spark > > already interoperates with several existing Apache projects (HDFS, HBase, > > Hive, Cassandra, Avro and Flume to name a few). The Spark team is > familiar > > with the Apache process and and subscribes to the Apache mission - the > > team includes multiple Apache committers already. Finally, joining Apache > > will help coordinate the development effort of the growing number of > > organizations which contribute to Spark. > > > > == Initial Goals == > > The initial goals will most likely be to move the existing codebase to > > Apache and integrate with the Apache development process. Furthermore, we > > plan for incremental development, and releases along with the Apache > > guidelines. > > > > === Current Status === > > == Meritocracy == > > The Spark project already operates on meritocratic principles. Today, > > Spark has several developers and has accepted multiple major patches from > > outside of U.C. Berkeley. While this process has remained mostly informal > > (we do not have an official committer list), an implicit organization > > exists in which individuals who contribute major components act as > > maintainers for those modules. If accepted, the Spark project would > > include several of these participants as committers from the onset. We > > will work to identify all committers and PPMC members for the project and > > to operate under the ASF meritocratic principles. > > > > === Community === > > Acceptance into the Apache foundation would bolster the already strong > > user and developer community around Spark. That community includes dozens > > of contributors from several institutions, a meetup group with several > > hundred members, and an active mailing list composed of hundreds of > users. > > Core Developers > > The core developers of our project are listed in our contributors and > > initial PPMC below. Though many exist at UC Berkeley, there is a > > representative cross sampling of other organizations including > Quantifind, > > Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends. > > > > > > === Alignment === > > Our proposed effort aligns with several ongoing BIGDATA and U.S. National > > priority funding interests including the NSF and its Expeditions program, > > and the DARPA XDATA project. Our industry partners and collaborators are > > well aligned with our code base. > > > > There are also a number of related Apache projects and dependencies, that > > will be mentioned in the Relationships with Other Apache products > section. > > > > == Known Risks == > > > > === Orphaned Products === > > Given the current level of investment in Spark - the risk of the project > > being abandoned is minimal. There are several constituents who are highly > > incentivized to continue development. The U.C. Berkeley AMPLab relies on > > Spark as a platform for a large number of long-term research projects. > > Several companies have build verticalized products which are tightly > > dependent on Spark. Other companies have devoted significant internal > > infrastructure investment in Spark. > > > > === Inexperience with Open Source === > > Spark has existed as a healthy open source project for several years. > > During that time, Matei and others have curated an open-source community > > successfully, attracting developers from a diverse group of companies > > including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, > and > > Webtrends. > > > > === Homogenous Developers === > > The initial list of committers includes developers from several > > institutions, including Quantifind, Microsoft, Yahoo!, ClearStory Data, > > Bizo, Intel, and Webtrends. > > > > === Reliance on Salaried Developers === > > Like most open source projects, Spark receives a substantial support from > > salaried developers. A large fraction of Spark development is supported > by > > graduate students at U.C. Berkeley in the course of research degrees - > > this is more a ³volunteer² relationship, since in most cases students > > contribute vastly more than is necessary to immediately support research. > > In addition, those working from within corporations often devote ³after > > hours² or spare time in the project - and these come from several > > organizations. We will work to ensure that the ability for the project to > > continuously be stewarded and to proceed forward independent of salaried > > developers is continued. > > > > > > === Relationship with Other Apache Products === > > Spark inter-operates with several existing Apache products by supporting > > them as storage layers: Apache Cassandra, Apache HBase, and Apache Hadoop > > (HDFS). It also uses several Apache components internally including > Apache > > Maven and several Apache Commons libraries. Finally, Shark (a higher > layer > > framework built on Spark) inter-operates with Apache Hive. We will > explore > > the relationship between Spark and Apache Gora, which also provides > > in-memory object storage (Champion Mattmann was the Champion for Apace > > Gora so we expect alignment and cross pollination between our efforts). > > > > Spark offers an alternative computation engine to Apache Hadoop > > (MapReduce). Unlike MapReduce, Spark is designed for lower-latency and > > interactive workloads. This makes the projects complimentary: many users > > run MapReduce and Spark side-by-side. > > > > === A Excessive Fascination with the Apache Brand === > > Spark is already a healthy and relatively well known open source project. > > This proposal is not for the purpose of generating publicity. Rather, the > > primary benefits to joining Apache are those outlined in the Rationale > > section. > > > > === Documentation === > > The reader will find these websites highly relevant: > > * Spark website: http://spark-project.org/ > > * Spark documentation: http://spark-project.org/documentation/ > > * Issue tracking: https://spark-project.atlassian.net/ > > * Codebase: https://github.com/mesos/spark > > * User group: https://groups.google.com/group/spark-users > > > > == Initial Source == > > The Spark codebase is currently hosted on Github: > > https://github.com/mesos/spark. This is the exact codebase that we would > > migrate to the Apache foundation. > > Source and Intellectual Property Submission Plan > > Currently, the Spark codebase is distributed under a BSD license. The > vast > > majority of code has copyright held by the University of California. Upon > > entering Apache, Spark will migrate to an Apache License with all > > copyright assigned to the Apache Foundation. The University of California > > will transfer all copyright to the Apache Foundation. In certain cases > > where individuals hold copyright, we will have individuals sign over > > copyright to the Apache foundation as well. > > > > Going forward, all commits would assign copyright directly to the Apache > > foundation through our signed Individual Contributor License Agreements > > for all initial committers on the project. > > > > > > == External Dependencies == > > To the best of our knowledge, all dependencies of Spark are distributed > > under Apache compatible licenses. Upon acceptance to the incubator, we > > would begin a thorough analysis of all transitive dependencies to verify > > this fact and introduce license checking into the build and release > > process (for instance integrating Apache Rat). > > > > == Required Resources == > > === Mailing list === > > We will migrate the existing Spark mailing lists as follows: > > > > * spark-users@googlegroups --> us...@spark.incubator.apache.org > > * spark-developers@googlegroups --> d...@spark.incubator.apache.org > > * spark-commits are hosted on Github, so we would request > > comm...@spark.incubator.apache.org > > > > The latter is to be consistent with the new PIAO naming scheme for > > podlings. > > > > === Source control === > > The Spark team would like to use Git for source control, due to our > > current use of Git. > > We request a writeable Git repo for Spark, and mirroring to be set up to > > Github through INFRA. Champion Mattmann can assist with creating INFRA > > tickets for this. > > > > === Issue Tracking === > > Spark currently uses a hosted JIRA deployment for issue tracking. We will > > migrate to the Apache JIRA. > > http://issues.apache.org/jira/browse/SPARK > > > > == Initial Committers == > > * Matei Zaharia <ma...@apache.org> > > * Ankur Dave <ankurd...@gmail.com> > > * Tathagata Das <t...@eecs.berkeley.edu> > > * Haoyuan Li <haoy...@cs.berkeley.edu> > > * Josh Rosen <joshro...@cs.berkeley.edu> > > * Reynold Xin <r...@cs.berkeley.edu> > > * Shivaram Venkataraman <shiva...@eecs.berkeley.edu> > > * Mosharaf Chowdhury <mosha...@cs.berkeley.edu> > > * Charles Reiss <char...@eecs.berkeley.edu> > > * Andy Konwinski <andykonwin...@gmail.com> > > * Patrick Wendell <pwend...@eecs.berkeley.edu> > > * Imran Rashid <im...@quantifind.com> > > * Ryan LeCompte <lecom...@gmail.com> > > * Ravi Pandya <ra...@exchange.microsoft.com> > > * Ram Sriharsha <harsh...@yahoo-inc.com> > > * Robert Evans <ev...@yahoo-inc.com> > > * Mridul Muralidharan <mrid...@yahoo-inc.com> > > * Thomas Dudziak <to...@clearstorydata.com> > > * Mark Hamstra <m...@clearstorydata.com> > > * Stephen Haberman <stephen.haber...@gmail.com> > > * Jason Dai <jason....@intel.com> > > * Shane Huang <shannie.hu...@gmail.com> > > * Andrew xia <xiajunl...@gmail.com> > > * Nick Pentreath <nick.pentre...@gmail.com> > > * Sean McNamara <sean.mcnam...@webtrends.com> > > > > == Affiliations == > > The initial committers are from nine organizations: UC Berkeley, > > Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Mxit and > > Webtrends. > > > > * Matei Zaharia (UCB) > > * Ankur Dave (UCB) > > * Tathagata Das (UCB) > > * Haoyuan Li (UCB) > > * Josh Rosen (UCB) > > * Reynold Xin (UCB) > > * Shivaram Venkataraman (UCB) > > * Mosharaf Chowdhury (UCB) > > * Charles Reiss (UCB) > > * Andy Konwinski (UCB) > > * Patrick Wendell (UCB) > > * Imran Rashid (Quantifind) > > * Ryan LeCompte (Quantifind) > > * Ravi Pandya (Microsoft) > > * Ram Sriharsha (Yahoo!) > > * Robert Evans (Yahoo!) > > * Mridul Muralidharam (Yahoo!) > > * Thomas Dudziak (ClearStory) > > * Mark Hamstra (ClearStory) > > * Stephen Haberman (Bizo) > > * Jason Dai (Intel) > > * Shane Huang (Intel) > > * Andrew Xia (Intel) > > * Nick Pentreath (Mxit) > > * Sean McNamara (Webtrends) > > > > == Sponsors == > > === Champion === > > * Chris Mattmann > > > > === Nominated Mentors === > > * Chris Mattmann > > * Paul Ramirez > > * Andrew Hart > > * Thomas Dudziak > > * Suresh Marru > > * Henry Saputra > > > > === Sponsoring Entity === > > The Apache Incubator > > > > > > > > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Chris Mattmann, Ph.D. > > Senior Computer Scientist > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > Office: 171-266B, Mailstop: 171-246 > > Email: chris.a.mattm...@nasa.gov > > WWW: http://sunset.usc.edu/~mattmann/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Adjunct Assistant Professor, Computer Science Department > > University of Southern California, Los Angeles, CA 90089 USA > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > For additional commands, e-mail: general-h...@incubator.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > > -- https://www.cs.indiana.edu/~tgunarat/ http://www.linkedin.com/in/thilina http://thilina.gunarathne.org