Thanks Sebastian, always love to see project from academic setting to be materialized as an Apache project
On Monday, April 7, 2014, Sebastian Schelter <s...@apache.org> wrote: > You're very welcome to join as a mentor, Henry! > > On 04/06/2014 07:34 PM, Henry Saputra wrote: > > Hi Guys, > > The proposal looks great and I would love to help to sign up as a > Mentor if you guys still have space for one. > > > - Henry > > > On Sun, Mar 30, 2014 at 12:14 AM, Alan Gates <ga...@hortonworks.com> > wrote: > > I would like to propose Stratosphere as an Apache Incubator project. I > have posted the proposal to https://wiki.apache.org/ > incubator/StratosphereProposal and posted the text of the proposal below. > > Alan. > > = Stratosphere = > > == Abstract == > Stratosphere is an open source system for parallel data analysis. > Stratosphere deeply integrates MapReduce and database technologies to > provide expressive and optimizable programming interfaces and at the same > time efficient and scalable execution. > > == Proposal == > Stratosphere is an open source system for expressive, declarative, fast, > and efficient data analysis. Stratosphere combines the scalability and > programming flexibility of distributed MapReduce-like platforms with the > efficiency, out-of-core execution, and query optimization capabilities > found in parallel databases. > > == Background == > There is currently a need for general-purpose cluster computing platforms > that are compatible with the Hadoop ecosystem, are more efficient, easier > to use, and can support more applications than Hadoop MapReduce, but are > not restricted to a specific data model and language (such as the > relational model and a variant of SQL). Stratosphere fulfils these needs. > > Stratosphere exposes expressive APIs in Java and Scala (conceptually > similar to Spark, Cascading, Scalding) that allow arbitrary user-defined > functions in the same language and data model that the program is written > in. Stratosphere programs pass through a cost-based optimizer that finds > the best execution path for these programs depending on the data and > cluster characteristics. The design and implementation of Stratosphere is > based on research that generalizes query optimizers in relational > databases. Stratosphere has a distributed runtime that is architected upon > the principles of parallel databases, providing true pipelining (a basis > for stream processing) and efficient out-of-core algorithms for grouping, > sorting, joining, and aggregating data. Stratosphere provides first-class > support for iterative algorithms via a built-in iterate operator, covering > Machine Learning and graph analysis use cases. It achieves performance > similar to Apache Giraph without being a specialized gr > > a > ph processing system. > > > Stratosphere has undergone three major releases (v0.1, v0.2, v0.4) and > some minor ones. > > == Rationale == > Stratosphere started out in 2008 as a research project by the Technical > University of Berlin, the Humboldt University of Berlin, and the Hasso > Plattner Institute, and has received subsequent funding from the German > Research Council, the European Institute of Innovation and Technology, the > European Commision, and industry. > > The traction of Stratosphere has by far exceeded our initial expectations, > and we are therefore seeking an organizational long-term home for > Stratosphere beyond the University walls that will house and further > encourage contributors from companies and other organizations that are > interested in Stratosphere. We believe that the Apache Software Foundation > is the ideal home for Stratosphere. Stratosphere integrates with several > existing Apache projects, such as HDFS, YARN, HBase, and Avro. The team is > familiar with the Apache processes and fully subscribes to the Apache > mission. One of the proposing members is a long-time Apache contributor and > PMC member. > > == Initial Goals == > * Move the existing codebase to Apache > * Integrate with the Apache development process > * Ensure all dependencies are compliant with Apache License version 2.0 > * Incremental development and releases per Apache guidelines > > > == Current Status == > === Meritocracy === > Stratosphere operated on meritocratic principles from the get go. The > initial project proposal submitted to the German Research Council > in 2008 stated that all code developed in the project will be released as > open source under the Apache 2 license. Currently, all the > discussions pertaining to Stratosphere development are public on [[ > https://github.com/stratosphere/stratosphere|GitHub]] and our > [[<https://groups.google.com/forum/#!forum/stratosphere-dev%7Cmailing> > >