Hi Guys,

The proposal looks great and I would love to help to sign up as a
Mentor if you guys still have space for one.


- Henry


On Sun, Mar 30, 2014 at 12:14 AM, Alan Gates <ga...@hortonworks.com> wrote:
> I would like to propose Stratosphere as an Apache Incubator project.  I have 
> posted the proposal to https://wiki.apache.org/incubator/StratosphereProposal 
> and posted the text of the proposal below.
>
> Alan.
>
> = Stratosphere =
>
> == Abstract ==
> Stratosphere is an open source system for parallel data analysis. 
> Stratosphere deeply integrates MapReduce and database technologies to provide 
> expressive and optimizable programming interfaces and at the same time 
> efficient and scalable execution.
>
> == Proposal ==
> Stratosphere is an open source system for expressive, declarative, fast, and 
> efficient data analysis. Stratosphere combines the scalability and 
> programming flexibility of distributed MapReduce-like platforms with the 
> efficiency, out-of-core execution, and query optimization capabilities found 
> in parallel databases.
>
> == Background ==
> There is currently a need for general-purpose cluster computing platforms 
> that are compatible with the Hadoop ecosystem, are more efficient, easier to 
> use, and can support more applications than Hadoop MapReduce, but are not 
> restricted to a specific data model and language (such as the relational 
> model and a variant of SQL). Stratosphere fulfils these needs.
>
> Stratosphere exposes expressive APIs in Java and Scala (conceptually similar 
> to Spark, Cascading, Scalding) that allow arbitrary user-defined functions in 
> the same language and data model that the program is written in. Stratosphere 
> programs pass through a cost-based optimizer that finds the best execution 
> path for these programs depending on the data and cluster characteristics. 
> The design and implementation of Stratosphere is based on research that 
> generalizes query optimizers in relational databases. Stratosphere has a 
> distributed runtime that is architected upon the principles of parallel 
> databases, providing true pipelining (a basis for stream processing) and 
> efficient out-of-core algorithms for grouping, sorting, joining, and 
> aggregating data. Stratosphere provides first-class support for iterative 
> algorithms via a built-in iterate operator, covering Machine Learning and 
> graph analysis use cases. It achieves performance similar to Apache Giraph 
> without being a specialized graph processing system.
>
> Stratosphere has undergone three major releases (v0.1, v0.2, v0.4) and some 
> minor ones.
>
> == Rationale ==
> Stratosphere started out in 2008 as a research project by the Technical 
> University of Berlin, the Humboldt University of Berlin, and the Hasso 
> Plattner Institute, and has received subsequent funding from the German 
> Research Council, the European Institute of Innovation and Technology, the 
> European Commision, and industry.
>
> The traction of Stratosphere has by far exceeded our initial expectations, 
> and we are therefore seeking an organizational long-term home for 
> Stratosphere beyond the University walls that will house and further 
> encourage contributors from companies and other organizations that are 
> interested in Stratosphere. We believe that the Apache Software Foundation is 
> the ideal home for Stratosphere. Stratosphere integrates with several 
> existing Apache projects, such as HDFS, YARN, HBase, and Avro. The team is 
> familiar with the Apache processes and fully subscribes to the Apache 
> mission. One of the proposing members is a long-time Apache contributor and 
> PMC member.
>
> == Initial Goals ==
>  * Move the existing codebase to Apache
>  * Integrate with the Apache development process
>  * Ensure all dependencies are compliant with Apache License version 2.0
>  * Incremental development and releases per Apache guidelines
>
>
> == Current Status ==
> === Meritocracy ===
> Stratosphere operated on meritocratic principles from the get go. The initial 
> project proposal submitted to the German Research Council
> in 2008 stated that all code developed in the project will be released as 
> open source under the Apache 2 license. Currently, all the
> discussions pertaining to Stratosphere development are public on 
> [[https://github.com/stratosphere/stratosphere|GitHub]]  and our 
> [[https://groups.google.com/forum/#!forum/stratosphere-dev|mailing list]]. 
> The current incubation proposal includes the major code contributors to 
> Stratosphere. Several additional people have worked on the Stratosphere 
> codebase for research prototypes and industry use cases and would be 
> interested in becoming committers. We are starting with a small committer 
> group and we plan to add additional committers following an open merit-based 
> decision process during the incubation phase.
>
> === Community ===
> Currently, the core of Stratosphere is developed at TU Berlin, mainly by the 
> committers listed in this proposal. Additional people from several 
> Universities and companies in Europe are working with Stratosphere and are 
> interested in becoming committers to the project.
>
> During the years, Stratosphere has been adopted as a platform for research 
> and teaching in several Universities (TU Berlin, HU Berlin, HPI, RWTH, Inria, 
> KTH, U. Trento, UCSD, and others), and it is currently witnessing its first 
> industrial installations. We are seeing a rapidly growing interest in 
> Stratosphere by both startups and large companies, as well as a growing 
> community (our first 
> [[http://stratosphere.eu/events/2013/summit.html|Stratosphere Summit]] in 
> November 2013 attracted over 80 participants). Stratosphere was recently 
> accepted as a mentoring organization in Google Summer of Code 2014.
>
> We believe that acceptance in the Apache Software Foundation will consolidate 
> the current community under one organizational umbrella, and most importantly 
> accelerate the growth of the community.
>
> === Core developers ===
> The core developers of the system are Stephan Ewen, Fabian Hueske, Daniel 
> Warneke, Robert Metzger, Ufuk Celebi, and Aljoscha Krettek, who are all 
> committers in the current proposal.
>
> === Alignment ===
> Stratosphere is compatible with, and related to several Apache projects. 
> Stratosphere re-uses parts of Apache Hadoop, in particular HDFS and YARN, as 
> well as Apache HBase and Apache Avro. Stratosphere is a very good compilation 
> target for query languages such as Apache Hive and Apache Pig.
>
> == Known Risks ==
> === Orphaned Products ===
> There is strong interest in Stratosphere by several companies and 
> organizations, and there is currently a long-term commitment to fund salaried 
> developers for Stratosphere by public and private organizations in Europe.
>
> === Inexperience with Open Source ===
> Sebastian Schelter is a committer and PMC member of Apache Mahout and Apache 
> Giraph, member of the Apache Software Foundation, member of the Incubator PMC 
> and project mentor for Apache Drill. Sebastian, along with our mentors, will 
> guide the rest of the committers that have experience with releasing software 
> as open source but little experience in participating in an open source 
> project besides Stratosphere itself.
>
> In mid-2013 Stratosphere transitioned from an “open source project with 
> publicly accessible source code” to an open source project that puts the 
> community first. We moved from a University-hosted git repository to GitHub, 
> where we discuss all issues publicly. This also includes release planning 
> (via GitHub’s milestone feature) and code reviews. We also moved our build 
> system to the publicly available Travis-CI. The mailing lists are hosted with 
> Google Groups, we use the public Maven repository infrastructure of Sonatype. 
> The source code of the www.stratosphere.eu website is publicly available and 
> is meant to be changed by external contributors (for example for 
> documentation purposes).
>
> === Homogeneous Developers ===
> Most committers in this proposal belong to the same institution (TU Berlin). 
> The engagement of these committers goes well beyond the necessary development 
> to support research, and all committers work on Stratosphere in their free 
> time. Several people from other institutions are working on and are familiar 
> with the Stratosphere codebase. We will work to attract them as future 
> committers during the incubation phase, following a merit-based approach.
>
> === Reliance on Salaried Developers ===
> Currently, Stratosphere receives support from salaried developers, in 
> particular from graduate students at TU Berlin that are funded by the German 
> Research Council, the European Institute of Technology, and the European 
> Commission. These students work in their free time on Stratosphere in 
> addition to their employment.
>
> We expect that Stratosphere development will occur on both salaried and 
> volunteer time. We will recruit additional committers, including non-salaried 
> developers, and we will work to ensure that the project will move forward 
> independently of salaried developers.
>
> === Relationship with Other Apache Products ===
> Stratosphere interfaces with several existing Apache projects: Apache HBase 
> for storage, Apache Hadoop (HDFS for storage, YARN for resource management, 
> and Stratosphere contains a generic wrapper for Hadoop MapReduce input 
> formats), and Apache Avro (for serialization). Stratosphere uses Apache Maven 
> and Apache Commons libraries internally. Stratosphere can be a great 
> compilation target for Apache Pig and Apache Hive, although such 
> functionality is not yet implemented.
>
> Stratosphere is also related with several projects undergoing incubation in 
> the Apache Incubation project, such as Tez, Drill, and Spark (graduated). 
> While all these projects target sufficiently different spaces and have 
> different architectures, it would be interesting to explore code reuse 
> possibilities. For example, we are currently basing our design for compiling 
> SQL to Stratosphere on the Optiq library, also used by Apache Drill.
>
> === An Excessive Fascination with the Apache Brand ===
> We believe that the Apache brand will help us attract contributors to 
> Stratosphere, by giving us a well-defined, transparent development process 
> under a known brand. At the same time, Stratosphere already has a healthy 
> community and current funding guarantees the further codebase development and 
> growth of the project for the next 3-5 years. The reason for this proposal is 
> not to gain publicity, but to further strengthen the longevity of the project 
> as explained in the Rationale section.
>
> == Documentation ==
>  * [[https://stratosphere.eu|Project website]]
>  * [[http://stratosphere.eu/docs/0.4/|Documentation]]
>  * [[https://github.com/stratosphere/stratosphere|Codebase]]
>  * [[https://groups.google.com/forum/#!forum/stratosphere-dev|Mailing list]]
>
> == Initial Source ==
> Stratosphere is hosted on 
> [[https://github.com/stratosphere/stratosphere|GitHub]] . This is the 
> codebase that we will migrate to the Apache Foundation. The code was 
> previously hosted on a TU Berlin’s own git infrastructure. It has always been 
> Apache 2.0 licensed.
>
> === Source and Intellectual Property Submission Plan ===
> All initial and past committers will sign a CLA with the ASF while the 
> incubator proposal for Stratosphere is being discussed. All organizations 
> that have employed Stratosphere contributors in the past will sign a SGA. 
> Current contributors will sign a CCLA. All major contributors are still 
> active in the project.
>
> === External Dependencies ===
> All critical dependencies are, to the extend of our knowledge, from other 
> Apache projects. These include Apache Hadoop (for YARN and HDFS) and some 
> libraries (log4j, commons codec, junit and more). Our web frontend uses some 
> MIT-licensed JavaScript libraries.
>
> == Required Resources ==
>
> === Mailing list ===
> We will migrate our mailing lists to the following:
>  * us...@stratosphere.incubator.apache.org
>  * d...@stratosphere.incubator.apache.org
>  * priv...@stratosphere.incubator.apache.org
>  * comm...@stratosphere.incubator.apache.org
>
> === Source control ===
> We would like to use Git for source control and enable GitHib mirroring 
> functionality, where code reviews on GitHub are automatically
> forwarded to the developer mailing list. (See also: 
> [[https://blogs.apache.org/infra/entry/improved_integration_between_apache_and]])
>
>
> === Issue tracking ===
> We are currently using GitHub for issue tracking. We request an Apache-hosted 
> JIRA, and we will import existing issues there.
>
>
> == Initial committers ==
>  * Stephan Ewen - stephan.e...@tu-berlin.de
>  * Fabian Hueske - fabian.hue...@tu-berlin.de
>  * Daniel Warneke - warn...@posteo.de
>  * Robert Metzger - metrob...@gmail.com
>  * Ufuk Celebi - u.cel...@fu-berlin.de
>  * Aljoscha Krettek - aljoscha.kret...@gmail.com
>  * Kostas Tzoumas - kostas.tzou...@tu-berlin.de
>  * Sebastian Schelter  - s...@apache.org
>
> === Affiliations ===
>  * Stephan Ewen (TU Berlin)
>  * Fabian Hueske (TU Berlin)
>  * Daniel Warneke (Amadeus IT Group)
>  * Robert Metzger (TU Berlin)
>  * Ufuk Celebi (FU Berlin)
>  * Aljoscha Krettek (TU Berlin)
>  * Kostas Tzoumas (TU Berlin)
>  * Sebastian Schelter (TU Berlin)
>
> == Sponsors ==
> === Champion ===
> Alan Gates (ga...@apache.org)
>
> === Nominated Mentors ===
>  * Sean Owen (sro...@apache.org) (Note: Sean is an Apache member but not 
> currently on the IPC, he will need to request IPMC membership)
>  * Ted Dunning (tdunn...@apache.org)
>  * Owen O'Malley (omal...@apache.org)
>
> === Sponsoring Entity ===
> The Apache Incubator
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Reply via email to