+1
On Fri, Apr 11, 2014 at 9:19 AM, Andrew Purtell <apurt...@apache.org> wrote: > +1 > > > On Thu, Apr 10, 2014 at 10:42 AM, Alan Gates <ga...@hortonworks.com> > wrote: > > > Based on the results of the discussion thread ( > > > http://mail-archives.apache.org/mod_mbox/incubator-general/201403.mbox/%3CCE562EE9-968C-420E-A719-8C08CDAC99F8%40hortonworks.com%3Einparticular > notice the discussion on name change in the disucssion ), I > > would like to call a vote on accepting Stratosphere into the incubator. > > > > [ ] +1 Accept Stratosphere into the Incubator > > [ ] +0 Indifferent to the acceptance of Stratosphere > > [ ] -1 Do not accept Stratosphere because ... > > > > The vote will be open until Monday April 14 18:00 UTC. > > > > https://wiki.apache.org/incubator/StratosphereProposal > > > > = Stratosphere = > > == Abstract == > > Stratosphere is an open source system for parallel data analysis. > > Stratosphere deeply integrates MapReduce and database technologies to > > provide expressive and optimizable programming interfaces and at the same > > time efficient and scalable execution. > > > > == Proposal == > > Stratosphere is an open source system for expressive, declarative, fast, > > and efficient data analysis. Stratosphere combines the scalability and > > programming flexibility of distributed MapReduce-like platforms with the > > efficiency, out-of-core execution, and query optimization capabilities > > found in parallel databases. > > > > == Background == > > There is currently a need for general-purpose cluster computing platforms > > that are compatible with the Hadoop ecosystem, are more efficient, easier > > to use, and can support more applications than Hadoop MapReduce, but are > > not restricted to a specific data model and language (such as the > > relational model and a variant of SQL). Stratosphere fulfils these needs. > > > > Stratosphere exposes expressive APIs in Java and Scala (conceptually > > similar to Spark, Cascading, Scalding) that allow arbitrary user-defined > > functions in the same language and data model that the program is written > > in. Stratosphere programs pass through a cost-based optimizer that finds > > the best execution path for these programs depending on the data and > > cluster characteristics. The design and implementation of Stratosphere is > > based on research that generalizes query optimizers in relational > > databases. Stratosphere has a distributed runtime that is architected > upon > > the principles of parallel databases, providing true pipelining (a basis > > for stream processing) and efficient out-of-core algorithms for grouping, > > sorting, joining, and aggregating data. Stratosphere provides first-class > > support for iterative algorithms via a built-in iterate operator, > covering > > Machine Learning and graph analysis use cases. It achieves performance > > similar to Apache Giraph without being a specialized graph processing > > system. > > > > Stratosphere has undergone three major releases (v0.1, v0.2, v0.4) and > > some minor ones. > > > > == Rationale == > > Stratosphere started out in 2008 as a research project by the Technical > > University of Berlin, the Humboldt University of Berlin, and the Hasso > > Plattner Institute, and has received subsequent funding from the German > > Research Council, the European Institute of Innovation and Technology, > the > > European Commision, and industry. > > > > The traction of Stratosphere has by far exceeded our initial > expectations, > > and we are therefore seeking an organizational long-term home for > > Stratosphere beyond the University walls that will house and further > > encourage contributors from companies and other organizations that are > > interested in Stratosphere. We believe that the Apache Software > Foundation > > is the ideal home for Stratosphere. Stratosphere integrates with several > > existing Apache projects, such as HDFS, YARN, HBase, and Avro. The team > is > > familiar with the Apache processes and fully subscribes to the Apache > > mission. One of the proposing members is a long-time Apache contributor > and > > PMC member. > > > > == Initial Goals == > > * Move the existing codebase to Apache > > * Integrate with the Apache development process > > * Ensure all dependencies are compliant with Apache License version 2.0 > > * Incremental development and releases per Apache guidelines > > > > == Current Status == > > === Meritocracy === > > Stratosphere operated on meritocratic principles from the get go. The > > initial project proposal submitted to the German Research Council in 2008 > > stated that all code developed in the project will be released as open > > source under the Apache 2 license. Currently, all the discussions > > pertaining to Stratosphere development are public on [[ > > https://github.com/stratosphere/stratosphere|GitHub]] and our [[ > > https://groups.google.com/forum/#!forum/stratosphere-dev|mailing list]]. > > The current incubation proposal includes the major code contributors to > > Stratosphere. Several additional people have worked on the Stratosphere > > codebase for research prototypes and industry use cases and would be > > interested in becoming committers. We are starting with a small committer > > group and we plan to add additional committers following an open > > merit-based decision process during the incubation phase. > > > > === Community === > > Currently, the core of Stratosphere is developed at TU Berlin, mainly by > > the committers listed in this proposal. Additional people from several > > Universities and companies in Europe are working with Stratosphere and > are > > interested in becoming committers to the project. > > > > During the years, Stratosphere has been adopted as a platform for > research > > and teaching in several Universities (TU Berlin, HU Berlin, HPI, RWTH, > > Inria, KTH, U. Trento, UCSD, and others), and it is currently witnessing > > its first industrial installations. We are seeing a rapidly growing > > interest in Stratosphere by both startups and large companies, as well > as a > > growing community (our first [[ > > http://stratosphere.eu/events/2013/summit.html|Stratosphere Summit]] in > > November 2013 attracted over 80 participants). Stratosphere was recently > > accepted as a mentoring organization in Google Summer of Code 2014. > > > > We believe that acceptance in the Apache Software Foundation will > > consolidate the current community under one organizational umbrella, and > > most importantly accelerate the growth of the community. > > > > === Core developers === > > The core developers of the system are Stephan Ewen, Fabian Hueske, Daniel > > Warneke, Robert Metzger, Ufuk Celebi, and Aljoscha Krettek, who are all > > committers in the current proposal. > > > > === Alignment === > > Stratosphere is compatible with, and related to several Apache projects. > > Stratosphere re-uses parts of Apache Hadoop, in particular HDFS and YARN, > > as well as Apache HBase and Apache Avro. Stratosphere is a very good > > compilation target for query languages such as Apache Hive and Apache > Pig. > > > > == Known Risks == > > === Orphaned Products === > > There is strong interest in Stratosphere by several companies and > > organizations, and there is currently a long-term commitment to fund > > salaried developers for Stratosphere by public and private organizations > in > > Europe. > > > > === Inexperience with Open Source === > > Sebastian Schelter is a committer and PMC member of Apache Mahout and > > Apache Giraph, member of the Apache Software Foundation, member of the > > Incubator PMC and project mentor for Apache Drill. Sebastian, along with > > our mentors, will guide the rest of the committers that have experience > > with releasing software as open source but little experience in > > participating in an open source project besides Stratosphere itself. > > > > In mid-2013 Stratosphere transitioned from an "open source project with > > publicly accessible source code" to an open source project that puts the > > community first. We moved from a University-hosted git repository to > > GitHub, where we discuss all issues publicly. This also includes release > > planning (via GitHub's milestone feature) and code reviews. We also moved > > our build system to the publicly available Travis-CI. The mailing lists > are > > hosted with Google Groups, we use the public Maven repository > > infrastructure of Sonatype. The source code of the > www.stratosphere.euwebsite is publicly available and is meant to be changed > by external > > contributors (for example for documentation purposes). > > > > === Homogeneous Developers === > > Most committers in this proposal belong to the same institution (TU > > Berlin). The engagement of these committers goes well beyond the > necessary > > development to support research, and all committers work on Stratosphere > in > > their free time. Several people from other institutions are working on > and > > are familiar with the Stratosphere codebase. We will work to attract them > > as future committers during the incubation phase, following a merit-based > > approach. > > > > === Reliance on Salaried Developers === > > Currently, Stratosphere receives support from salaried developers, in > > particular from graduate students at TU Berlin that are funded by the > > German Research Council, the European Institute of Technology, and the > > European Commission. These students work in their free time on > Stratosphere > > in addition to their employment. > > > > We expect that Stratosphere development will occur on both salaried and > > volunteer time. We will recruit additional committers, including > > non-salaried developers, and we will work to ensure that the project will > > move forward independently of salaried developers. > > > > === Relationship with Other Apache Products === > > Stratosphere interfaces with several existing Apache projects: Apache > > HBase for storage, Apache Hadoop (HDFS for storage, YARN for resource > > management, and Stratosphere contains a generic wrapper for Hadoop > > MapReduce input formats), and Apache Avro (for serialization). > Stratosphere > > uses Apache Maven and Apache Commons libraries internally. Stratosphere > can > > be a great compilation target for Apache Pig and Apache Hive, although > such > > functionality is not yet implemented. > > > > Stratosphere is also related with several projects undergoing incubation > > in the Apache Incubation project, such as Tez, Drill, and Spark > > (graduated). While all these projects target sufficiently different > spaces > > and have different architectures, it would be interesting to explore code > > reuse possibilities. For example, we are currently basing our design for > > compiling SQL to Stratosphere on the Optiq library, also used by Apache > > Drill. > > > > === An Excessive Fascination with the Apache Brand === > > We believe that the Apache brand will help us attract contributors to > > Stratosphere, by giving us a well-defined, transparent development > process > > under a known brand. At the same time, Stratosphere already has a healthy > > community and current funding guarantees the further codebase development > > and growth of the project for the next 3-5 years. The reason for this > > proposal is not to gain publicity, but to further strengthen the > longevity > > of the project as explained in the Rationale section. > > > > == Documentation == > > * [[https://stratosphere.eu|Project website]] > > * [[http://stratosphere.eu/docs/0.4/|Documentation]] > > * [[https://github.com/stratosphere/stratosphere|Codebase]] > > * [[ > https://groups.google.com/forum/#!forum/stratosphere-dev|Mailinglist]] > > > > == Initial Source == > > Stratosphere is hosted on [[ > > https://github.com/stratosphere/stratosphere|GitHub]] . This is the > > codebase that we will migrate to the Apache Foundation. The code was > > previously hosted on a TU Berlin's own git infrastructure. It has always > > been Apache 2.0 licensed. > > > > === Source and Intellectual Property Submission Plan === > > All initial and past committers will sign a CLA with the ASF while the > > incubator proposal for Stratosphere is being discussed. All organizations > > that have employed Stratosphere contributors in the past will sign a SGA. > > Current contributors will sign a CCLA. All major contributors are still > > active in the project. > > > > === External Dependencies === > > All critical dependencies are, to the extend of our knowledge, from other > > Apache projects. These include Apache Hadoop (for YARN and HDFS) and some > > libraries (log4j, commons codec, junit and more). Our web frontend uses > > some MIT-licensed JavaScript libraries. > > > > == Required Resources == > > === Mailing list === > > We will migrate our mailing lists to the following: > > > > * us...@stratosphere.incubator.apache.org > > * d...@stratosphere.incubator.apache.org > > * priv...@stratosphere.incubator.apache.org > > * comm...@stratosphere.incubator.apache.org > > > > === Source control === > > We would like to use Git for source control and enable GitHib mirroring > > functionality, where code reviews on GitHub are automatically forwarded > to > > the developer mailing list. (See also: > > > https://blogs.apache.org/infra/entry/improved_integration_between_apache_and > > ) > > > > === Issue tracking === > > We are currently using GitHub for issue tracking. We request an > > Apache-hosted JIRA, and we will import existing issues there. > > > > == Initial committers == > > * Stephan Ewen - stephan.e...@tu-berlin.de > > * Fabian Hueske - fabian.hue...@tu-berlin.de > > * Daniel Warneke - warn...@posteo.de > > * Robert Metzger - metrob...@gmail.com > > * Ufuk Celebi - u.cel...@fu-berlin.de > > * Aljoscha Krettek - aljoscha.kret...@gmail.com > > * Kostas Tzoumas - kostas.tzou...@tu-berlin.de > > * Sebastian Schelter - s...@apache.org > > > > === Affiliations === > > * Stephan Ewen (TU Berlin) > > * Fabian Hueske (TU Berlin) > > * Daniel Warneke (Amadeus IT Group) > > * Robert Metzger (TU Berlin) > > * Ufuk Celebi (FU Berlin) > > * Aljoscha Krettek (TU Berlin) > > * Kostas Tzoumas (TU Berlin) > > * Sebastian Schelter (TU Berlin) > > > > == Sponsors == > > === Champion === > > Alan Gates ( ga...@apache.org ) > > > > === Nominated Mentors === > > * Sean Owen ( sro...@apache.org ) (Note: Sean is an Apache member but > > not currently on the IPC, he will need to request IPMC membership) > > * Ted Dunning ( tdunn...@apache.org ) > > * Owen O'Malley ( omal...@apache.org ) > > * Henry Saputra ( hsapu...@apache.org ) > > * Ashutosh Chauhan (hashut...@apache.org) > > > > === Sponsoring Entity === > > The Apache Incubator > > > > > > -- > > CONFIDENTIALITY NOTICE > > NOTICE: This message is intended for the use of the individual or entity > to > > which it is addressed and may contain information that is confidential, > > privileged and exempt from disclosure under applicable law. If the reader > > of this message is not the intended recipient, you are hereby notified > that > > any printing, copying, dissemination, distribution, disclosure or > > forwarding of this communication is strictly prohibited. If you have > > received this communication in error, please contact the sender > immediately > > and delete it from your system. Thank You. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > For additional commands, e-mail: general-h...@incubator.apache.org > > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >