+1
On Sat, May 10, 2014 at 2:03 AM, Ashutosh Chauhan <hashut...@apache.org>wrote: > Based on the results of the discussion thread ( > > http://mail-archives.apache.org/mod_mbox/incubator-general/201404.mbox/%3CCA%2BFBdFQA4TghLRdh9GgDKaMtKLQHxE_QZV%3DoZ7HfiDSA_jyqwg%40mail.gmail.com%3E > ), I would like to call a vote on accepting Optiq into the incubator. > > [ ] +1 Accept Optiq into the Incubator > [ ] +0 Indifferent to the acceptance of Stratosphere > [ ] -1 Do not accept Optiq because ... > > The vote will be open until Tuesday May 13 18:00 UTC. > > https://wiki.apache.org/incubator/OptiqProposal > > = Optiq = > == Abstract == > > Optiq is a framework that allows efficient translation of queries involving > heterogeneous and federated data. > > == Proposal == > > Optiq is a highly customizable engine for parsing and planning queries on > data in a wide variety of formats. It allows database-like access, and in > particular a SQL interface and advanced query optimization, for data not > residing in a traditional database. > > == Background == > > Databases were traditionally engineered in a monolithic stack, providing a > data storage format, data processing algorithms, query parser, query > planner, built-in functions, metadata repository and connectivity layer. > They innovate in some areas but rarely in all. > > Modern data management systems are decomposing that stack into separate > components, separating data, processing engine, metadata, and query > language support. They are highly heterogeneous, with data in multiple > locations and formats, caching and redundant data, different workloads, and > processing occurring in different engines. > > Query planning (sometimes called query optimization) has always been a key > function of a DBMS, because it allows the implementors to introduce new > query-processing algorithms, and allows data administrators to re-organize > the data without affecting applications built on that data. In a > componentized system, the query planner integrates the components (data > formats, engines, algorithms) without introducing unncessary coupling or > performance tradeoffs. > > But building a query planner is hard; many systems muddle along without a > planner, and indeed a SQL interface, until the demand from their customers > is overwhelming. > > There is an opportunity to make this process more efficient by creating a > re-usable framework. > > == Rationale == > > Optiq allows database-like access, and in particular a SQL interface and > advanced query optimization, for data not residing in a traditional > database. It is complementary to many current Hadoop and NoSQL systems, > which have innovative and performant storage and runtime systems but lack a > SQL interface and intelligent query translation. > > Optiq is already in use by several projects, including Apache Drill, Apache > Hive and Cascading Lingual, and commercial products. > > Optiq's architecture consists of: > > An extensible relational algebra. > * SPIs (service-provider interfaces) for metadata (schemas and tables), > planner rules, statistics, cost-estimates, user-defined functions. > * Built-in sets of rules for logical transformations and common > data-sources. > * Two query planning engines driven by rules, statistics, etc. One engine > is cost-based, the other rule-based. > * Optional SQL parser, validator and translator to relational algebra. > * Optional JDBC driver. > > == Initial Goals == > > The initial goals are be to move the existing codebase to Apache and > integrate with the Apache development process. Once this is accomplished, > we plan for incremental development and releases that follow the Apache > guidelines. > > As we move the code into the org.apache namespace, we will restructure > components as necessary to allow clients to use just the components of > Optiq that they need. > > A version 1.0 release, including pre-built binaries, will foster wider > adoption. > > == Current Status == > > Optiq has had over a dozen minor releases over the last 18 months. Its core > SQL parser and validator, and its planning engine and core rules, are > mature and robust and are the basis for several production systems; but > other components and SPIs are still undergoing rapid evolution. > > === Meritocracy === > > We plan to invest in supporting a meritocracy. We will discuss the > requirements in an open forum. We encourage the companies and projects > using Optiq to discuss their requirements in an open forum and to > participate in development. We will encourage and monitor community > participation so that privileges can be extended to those that contribute. > > Optiq's pluggable architecture encourages developers to contribute > extensions such as adapters for data sources, new planning rules, and > better statistics and cost-estimation functions. We look forward to > fostering a rich ecosystem of extensions. > > === Community === > > Building a data management system requires a high degree of technical > skill, and correspondingly, the community of developers directly using > Optiq is potentially fairly small, albeit highly technical and engaged. But > we also expect engagement from members of the communities of projects that > use Optiq, such as Drill and Hive. And we intend to structure Optiq so that > it can be used for lighter weight applications, such as providing a SQL and > JDBC interface to a NoSQL system. > > === Core Developers === > > The developers on the initial committers list are all experienced open > source developers, and are actively using Optiq in their projects. > > * Julian Hyde is lead developer of Mondrian, an open source OLAP engine, > and an Apache Drill committer. > * Chris Wensel is lead developer of Cascading, and of Lingual, the SQL > interface to Cascading built using Optiq. > * Jacques Nadeau is lead developer of Apache Drill, which uses Optiq. > > In addition, there are several regular contributors whom we hope will > graduate to committers during the incubation process. > > We realize that additional employer diversity is needed, and we will work > aggressively to recruit developers from additional companies. > > === Alignment === > > Apache, and in particular the ecosystem surrounding Hadoop, contains > several projects for building data management systems that leverage each > other's capabilities. Optiq is a natural fit for that ecosystem, and will > help foster projects meeting new challenges. > > Optiq is already used by Apache Hive and Apache Drill; Optiq embeds Apache > Spark as an optional engine; we are in discussion with Apache Phoenix about > integrating JDBC and query planning. > > == Known Risks == > > === Orphaned Products === > > Optiq is already a key component in three independent projects, each backed > by a different company, so the risk of being orphaned is relatively low. We > plan to mitigate this risk by recruiting additional committers, and > promoting Optiq's adoption as a framework by other projects. > > === Inexperience with Open Source === > > The initial committers are all Apache members, some of whom have several > years in the Apache Hadoop community. The founder of the project, Julian > Hyde, has been a founder and key developer in open source projects for over > ten years. > > === Homogenous Developers === > > The initial committers are employed by a number of companies, including > Concurrent, Hortonworks, MapR Technologies and Salesforce.com. We are > committed to recruiting additional committers from outside these companies. > > === Reliance on Salaried Developers === > > Like most open source projects, Optiq receives substantial support from > salaried developers. This is to be expected given that it is a highly > technical framework. However, they are all passionate about the project, > and we are confident that the project will continue even if no salaried > developers contribute to the project. As a framework, the project > encourages the involvement of members of other projects, and of academic > researchers. We are committed to recruiting additional committers including > non-salaried developers. > > === Relationships with Other Apache Products === > > As mentioned in the Alignment section, Optiq is being used by Apache Hive > and Apache Drill, and has adapters for Apache Phoenix and Apache Spark. > Optiq often operates on data in a Hadoop environment, so collaboration with > other Hadoop projects is desirable and highly likely. > > Unsurprisingly there is some overlap in capabilities between Optiq and > other Apache projects. Several projects that are databases or database-like > have query-planning capabilities. These include Hive, Drill, Phoenix, > Spark, Apache Derby, Apache Pig, Apache Jena and Apache Tajo. Optiq's query > planner is extensible at run time, and does not have a preferred runtime > engine on which to execute compiled queries. These capabilities, and the > large corpus of pre-built rules, are what allow Optiq to be embedded in > other projects. > > Several other Apache projects access third-party data sources, including > Hive, Pig, Drill, Spark and Apache MetaModel. Optiq allows users to > optimize access to third-party data sources by writing rules to push > processing down to the data source, and provide a cost model to choose the > optimal location for processing. That said, maintaining a library of > adapters is burdensome, and so it would make sense to collaborate with > other projects on adapter libraries, and re-use libraries where possible. > > Optiq supports several front ends for submitting queries. The most popular > is SQL, with driver connectivity via JDBC (and ODBC planned). Other Apache > projects with a SQL parser include Hive, Spark, Phoenix, Derby, Tajo. Drill > uses Optiq's parser and JDBC stack; both Phoenix and Drill have expressed > interest in collaborating on JDBC and ODBC. Optiq's Linq4j API is similar > to the fluent query-builder APIs in Spark and MetaModel. Use of a front end > is not required; for instance, Hive integrates with Optiq by directly > building a graph of RelNode objects. > > === An Excessive Fascination with the Apache Brand === > > Optiq solves a real problem, as evidenced by its take-up by other projects. > This proposal is not for the purpose of generating publicity. Rather, the > primary benefits to joining Apache are those outlined in the Rationale > section. > > == Documentation == > > Additional documentation for Optiq may be found on its github site: > > * [[https://github.com/julianhyde/optiq/blob/master/README.md|Overview]] > * [[ > https://github.com/julianhyde/optiq-csv/blob/master/TUTORIAL.md|Tutorial]] > * [[https://github.com/julianhyde/optiq/blob/master/HOWTO.md|HOWTO]] > * [[ > https://github.com/julianhyde/optiq/blob/master/REFERENCE.md|Referenceguide > ]] > > ==== Presentation ==== > > *[[ > > https://github.com/julianhyde/share/blob/master/slides/optiq-richrelevance-2013.pdf?raw=true| > SQL on Big Data using Optiq]] > == Initial Source == > > The initial code codebase resides in three projects, all hosted on github: > > * https://github.com/julianhyde/optiq > * https://github.com/julianhyde/optiq-csv > * https://github.com/julianhyde/linq4j > > === Source and Intellectual Property Submission Plan === > > The initial codebase is already distributed under the Apache 2.0 License. > The owners of the IP have indicated willingness to sign the SGA. > > === External Dependencies === > > Optiq and Linq4j have the following external dependencies. > > * Java 1.6, 1.7 or 1.8 > * Apache Maven, Commons > * JavaCC (BSD license) > * Sqlline 1.1.6 (BSD license) > * Junit 4.11 (EPL) > * Janino (BSD license) > * Guava (Apache 2.0 license) > * Eigenbase-resgen, eigenbase-xom, eigenbase-properties (Apache 2.0 > license) > > Some of Optiq's adapters (optiq-csv, optiq-mongodb, optiq-spark, > optiq-splunk) are currently developed alongside core Optiq, and have the > following additional dependencies: > > * Open CSV 2.3 (Apache 2.0 license) > * Apache Incubator Spark > * Mongo Java driver (Apache 2.0 license) > Upon acceptance to the incubator, we would begin a thorough analysis of all > transitive dependencies to verify this information and introduce license > checking into the build and release process by integrating with Apache Rat. > > === Cryptography === > > Optiq will eventually support encryption on the wire. This is not one of > the initial goals, and we do not expect Optiq to be a controlled export > item due to the use of encryption. > > == Required Resources == > > === Mailing Lists === > > * priv...@optiq.incubator.apache.org > * d...@optiq.incubator.apache.org (will be migrated from > optiq-...@googlegroups.com) > * comm...@optiq.incubator.apache.org > > === Source control === > > The Optiq team would like to use git for source control, due to our current > use of git/github. We request a writeable git repo git:// > git.apache.org/incubator-optiq, and mirroring to be set up to github > through INFRA. > > === Issue Tracking === > > Optiq currently uses the github issue tracking system associated with its > github repo: https://github.com/julianhyde/optiq/issues. We will migrate > to > the Apache JIRA: http://issues.apache.org/jira/browse/OPTIQ. > > == Initial Committers == > > * Julian Hyde (jhyde at apache dot org) > * Jacques Nadeau (jacques at apache dot org) > * James R. Taylor (jamestaylor at apache dot org) > * Chris Wensel (cwensel at apache dot org) > > === Affiliations === > > The initial committers are employees of Concurrent, Hortonworks, MapR and > Salesforce.com. > > * Julian Hyde (Hortonworks) > * Jacques Nadeau (MapR Technologies) > * James R. Taylor (Salesforce.com) > * Chris Wensel (Concurrent) > > == Sponsors == > > === Champion === > > * Ashutosh Chauhan (hashutosh at apache dot org) > > === Nominated Mentors === > > * Ted Dunning (tdunning at apache dot org) - Chief Application Architect > at MapR Technologies; committer for Lucene, Mahout and ZooKeeper. > * Alan Gates (gates at apache dot org) - Architect at Hortonworks; > committer for Pig, Hive and others. > * Steven Noels (stevenn at apache dot org) - Chief Technical Officer at > NGDATA; committer for Cocoon and Forrest, mentor for Phoenix. > > === Sponsoring Entity === > > The Apache Incubator. > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)