Now that discussion is settling down, I will start a vote thread shortly.
On Mon, May 5, 2014 at 3:22 PM, Ashutosh Chauhan <hashut...@apache.org>wrote: > Thanks everyone for great feedback. With Julian's help I have updated the > section "Relationships with Other Apache projects" so that folks can get a > sense where Optiq stands w.r.t other projects going on at ASF. > > Thanks, > Ashutosh > > > On Fri, May 2, 2014 at 11:23 AM, Henry Saputra <henry.sapu...@gmail.com>wrote: > >> Ah sorry, I did not mean "asking to update", I meant "proposing to >> update". >> >> Thanks, >> >> - Henry >> >> On Fri, May 2, 2014 at 11:20 AM, Henry Saputra <henry.sapu...@gmail.com> >> wrote: >> > HI Ashutosh, >> > >> > Since there was a question/ comment about relationship with Apache >> > MetaModel, I am asking to update the proposal to include this >> > discussion in either "Relationships with Other Apache Products" or >> > "Alignment" section before going for a VOTE. >> > >> > Apache Slider did the same thing with relation to Apache Twill and >> > Apache Helix projects. >> > >> > Thanks, >> > >> > - Henry >> > >> > On Wed, Apr 30, 2014 at 3:21 PM, Ashutosh Chauhan <hashut...@apache.org> >> wrote: >> >> I would like to propose Optiq as an Apache Incubator project. I have >> >> posted the proposal to https://wiki.apache.org/incubator/OptiqProposaland >> >> posted the text of the proposal below. >> >> >> >> Ashutosh. >> >> >> >> = Optiq = >> >> == Abstract == >> >> >> >> Optiq is a framework that allows efficient translation of queries >> involving >> >> heterogeneous and federated data. >> >> >> >> == Proposal == >> >> >> >> Optiq is a highly customizable engine for parsing and planning queries >> on >> >> data in a wide variety of formats. It allows database-like access, and >> in >> >> particular a SQL interface and advanced query optimization, for data >> not >> >> residing in a traditional database. >> >> >> >> == Background == >> >> >> >> Databases were traditionally engineered in a monolithic stack, >> providing a >> >> data storage format, data processing algorithms, query parser, query >> >> planner, built-in functions, metadata repository and connectivity >> layer. >> >> They innovate in some areas but rarely in all. >> >> >> >> Modern data management systems are decomposing that stack into separate >> >> components, separating data, processing engine, metadata, and query >> >> language support. They are highly heterogeneous, with data in multiple >> >> locations and formats, caching and redundant data, different >> workloads, and >> >> processing occurring in different engines. >> >> >> >> Query planning (sometimes called query optimization) has always been a >> key >> >> function of a DBMS, because it allows the implementors to introduce new >> >> query-processing algorithms, and allows data administrators to >> re-organize >> >> the data without affecting applications built on that data. In a >> >> componentized system, the query planner integrates the components (data >> >> formats, engines, algorithms) without introducing unncessary coupling >> or >> >> performance tradeoffs. >> >> >> >> But building a query planner is hard; many systems muddle along >> without a >> >> planner, and indeed a SQL interface, until the demand from their >> customers >> >> is overwhelming. >> >> >> >> There is an opportunity to make this process more efficient by >> creating a >> >> re-usable framework. >> >> >> >> == Rationale == >> >> >> >> Optiq allows database-like access, and in particular a SQL interface >> and >> >> advanced query optimization, for data not residing in a traditional >> >> database. It is complementary to many current Hadoop and NoSQL systems, >> >> which have innovative and performant storage and runtime systems but >> lack a >> >> SQL interface and intelligent query translation. >> >> >> >> Optiq is already in use by several projects, including Apache Drill, >> Apache >> >> Hive and Cascading Lingual, and commercial products. >> >> >> >> Optiq's architecture consists of: >> >> >> >> An extensible relational algebra. >> >> SPIs (service-provider interfaces) for metadata (schemas and tables), >> >> planner rules, statistics, cost-estimates, user-defined functions. >> >> Built-in sets of rules for logical transformations and common >> data-sources. >> >> Two query planning engines driven by rules, statistics, etc. One >> engine is >> >> cost-based, the other rule-based. >> >> Optional SQL parser, validator and translator to relational algebra. >> >> Optional JDBC driver. >> >> == Initial Goals == >> >> >> >> The initial goals are be to move the existing codebase to Apache and >> >> integrate with the Apache development process. Once this is >> accomplished, >> >> we plan for incremental development and releases that follow the Apache >> >> guidelines. >> >> >> >> As we move the code into the org.apache namespace, we will restructure >> >> components as necessary to allow clients to use just the components of >> >> Optiq that they need. >> >> >> >> A version 1.0 release, including pre-built binaries, will foster wider >> >> adoption. >> >> >> >> == Current Status == >> >> >> >> Optiq has had over a dozen minor releases over the last 18 months. Its >> core >> >> SQL parser and validator, and its planning engine and core rules, are >> >> mature and robust and are the basis for several production systems; but >> >> other components and SPIs are still undergoing rapid evolution. >> >> >> >> === Meritocracy === >> >> >> >> We plan to invest in supporting a meritocracy. We will discuss the >> >> requirements in an open forum. We encourage the companies and projects >> >> using Optiq to discuss their requirements in an open forum and to >> >> participate in development. We will encourage and monitor community >> >> participation so that privileges can be extended to those that >> contribute. >> >> >> >> Optiq's pluggable architecture encourages developers to contribute >> >> extensions such as adapters for data sources, new planning rules, and >> >> better statistics and cost-estimation functions. We look forward to >> >> fostering a rich ecosystem of extensions. >> >> >> >> === Community === >> >> >> >> Building a data management system requires a high degree of technical >> >> skill, and correspondingly, the community of developers directly using >> >> Optiq is potentially fairly small, albeit highly technical and >> engaged. But >> >> we also expect engagement from members of the communities of projects >> that >> >> use Optiq, such as Drill and Hive. And we intend to structure Optiq so >> that >> >> it can be used for lighter weight applications, such as providing a >> SQL and >> >> JDBC interface to a NoSQL system. >> >> >> >> === Core Developers === >> >> >> >> The developers on the initial committers list are all experienced open >> >> source developers, and are actively using Optiq in their projects. >> >> >> >> * Julian Hyde is lead developer of Mondrian, an open source OLAP >> engine, >> >> and an Apache Drill committer. >> >> * Chris Wensel is lead developer of Cascading, and of Lingual, the SQL >> >> interface to Cascading built using Optiq. >> >> * Jacques Nadeau is lead developer of Apache Drill, which uses Optiq. >> >> >> >> In addition, there are several regular contributors whom we hope will >> >> graduate to committers during the incubation process. >> >> >> >> We realize that additional employer diversity is needed, and we will >> work >> >> aggressively to recruit developers from additional companies. >> >> >> >> === Alignment === >> >> >> >> Apache, and in particular the ecosystem surrounding Hadoop, contains >> >> several projects for building data management systems that leverage >> each >> >> other's capabilities. Optiq is a natural fit for that ecosystem, and >> will >> >> help foster projects meeting new challenges. >> >> >> >> Optiq is already used by Apache Hive and Apache Drill; Optiq embeds >> Apache >> >> Spark as an optional engine; we are in discussion with Apache Phoenix >> about >> >> integrating JDBC and query planning. >> >> >> >> == Known Risks == >> >> >> >> === Orphaned Products === >> >> >> >> Optiq is already a key component in three independent projects, each >> backed >> >> by a different company, so the risk of being orphaned is relatively >> low. We >> >> plan to mitigate this risk by recruiting additional committers, and >> >> promoting Optiq's adoption as a framework by other projects. >> >> >> >> === Inexperience with Open Source === >> >> >> >> The initial committers are all Apache members, some of whom have >> several >> >> years in the Apache Hadoop community. The founder of the project, >> Julian >> >> Hyde, has been a founder and key developer in open source projects for >> over >> >> ten years. >> >> >> >> === Homogenous Developers === >> >> >> >> The initial committers are employed by a number of companies, including >> >> Concurrent, Hortonworks, MapR Technologies and Salesforce.com. We are >> >> committed to recruiting additional committers from outside these >> companies. >> >> >> >> === Reliance on Salaried Developers === >> >> >> >> Like most open source projects, Optiq receives substantial support from >> >> salaried developers. This is to be expected given that it is a highly >> >> technical framework. However, they are all passionate about the >> project, >> >> and we are confident that the project will continue even if no salaried >> >> developers contribute to the project. As a framework, the project >> >> encourages the involvement of members of other projects, and of >> academic >> >> researchers. We are committed to recruiting additional committers >> including >> >> non-salaried developers. >> >> >> >> === Relationships with Other Apache Products === >> >> >> >> As mentioned in the Alignment section, Optiq is being used by Apache >> Hive >> >> and Apache Drill, and has adapters for Apache Phoenix and Apache Spark. >> >> Optiq often operates on data in a Hadoop environment, so collaboration >> with >> >> other Hadoop projects is desirable and highly likely. >> >> >> >> === An Excessive Fascination with the Apache Brand === >> >> >> >> Optiq solves a real problem, as evidenced by its take-up by other >> projects. >> >> This proposal is not for the purpose of generating publicity. Rather, >> the >> >> primary benefits to joining Apache are those outlined in the Rationale >> >> section. >> >> >> >> == Documentation == >> >> >> >> Additional documentation for Optiq may be found on its github site: >> >> >> >> * [[ >> https://github.com/julianhyde/optiq/blob/master/README.md|Overview]] >> >> * [[ >> >> >> https://github.com/julianhyde/optiq-csv/blob/master/TUTORIAL.md|Tutorial >> ]] >> >> * [[https://github.com/julianhyde/optiq/blob/master/HOWTO.md|HOWTO]] >> >> * [[ >> https://github.com/julianhyde/optiq/blob/master/REFERENCE.md|Referenceguide >> ]] >> >> >> >> === Presentation: === >> >> >> >> *[[ >> >> >> https://github.com/julianhyde/share/blob/master/slides/optiq-richrelevance-2013.pdf?raw=true| >> >> SQL on Big Data using Optiq]] >> >> == Initial Source == >> >> >> >> The initial code codebase resides in three projects, all hosted on >> github: >> >> >> >> * https://github.com/julianhyde/optiq >> >> * https://github.com/julianhyde/optiq-csv >> >> * https://github.com/julianhyde/linq4j >> >> >> >> === Source and Intellectual Property Submission Plan === >> >> >> >> The initial codebase is already distributed under the Apache 2.0 >> License. >> >> The owners of the IP have indicated willingness to sign the SGA. >> >> >> >> === External Dependencies === >> >> >> >> Optiq and Linq4j have the following external dependencies. >> >> >> >> * Java 1.6, 1.7 or 1.8 >> >> * Apache Maven, Commons >> >> * JavaCC (BSD license) >> >> * Sqlline 1.1.6 (BSD license) >> >> * Junit 4.11 (EPL) >> >> * Janino (BSD license) >> >> * Guava (Apache 2.0 license) >> >> * Eigenbase-resgen, eigenbase-xom, eigenbase-properties (Apache 2.0 >> >> license) >> >> >> >> Some of Optiq's adapters (optiq-csv, optiq-mongodb, optiq-spark, >> >> optiq-splunk) are currently developed alongside core Optiq, and have >> the >> >> following additional dependencies: >> >> >> >> * Open CSV 2.3 (Apache 2.0 license) >> >> * Apache Incubator Spark >> >> * Mongo Java driver (Apache 2.0 license) >> >> Upon acceptance to the incubator, we would begin a thorough analysis >> of all >> >> transitive dependencies to verify this information and introduce >> license >> >> checking into the build and release process by integrating with Apache >> Rat. >> >> >> >> === Cryptography === >> >> >> >> Optiq will eventually support encryption on the wire. This is not one >> of >> >> the initial goals, and we do not expect Optiq to be a controlled export >> >> item due to the use of encryption. >> >> >> >> == Required Resources == >> >> >> >> === Mailing Lists === >> >> >> >> * priv...@optiq.incubator.apache.org >> >> * d...@optiq.incubator.apache.org (will be migrated from >> >> optiq-...@googlegroups.com) >> >> * comm...@optiq.incubator.apache.org >> >> >> >> === Source control === >> >> >> >> The Optiq team would like to use git for source control, due to our >> current >> >> use of git/github. We request a writeable git repo git:// >> >> git.apache.org/incubator-optiq, and mirroring to be set up to github >> >> through INFRA. >> >> >> >> === Issue Tracking === >> >> >> >> Optiq currently uses the github issue tracking system associated with >> its >> >> github repo: https://github.com/julianhyde/optiq/issues. We will >> migrate to >> >> the Apache JIRA: http://issues.apache.org/jira/browse/OPTIQ. >> >> >> >> == Initial Committers == >> >> >> >> * Julian Hyde (jhyde at apache dot org) >> >> * Jacques Nadeau (jacques at apache dot org) >> >> * James R. Taylor (jamestaylor at apache dot org) >> >> * Chris Wensel (cwensel at apache dot org) >> >> >> >> === Affiliations === >> >> >> >> The initial committers are employees of Concurrent, Hortonworks, MapR >> and >> >> Salesforce.com. >> >> >> >> * Julian Hyde (Hortonworks) >> >> * Jacques Nadeau (MapR Technologies) >> >> * James R. Taylor (Salesforce.com) >> >> * Chris Wensel (Concurrent) >> >> >> >> == Sponsors == >> >> >> >> === Champion === >> >> >> >> * Ashutosh Chauhan (hashutosh at apache dot org) >> >> >> >> === Nominated Mentors === >> >> >> >> * Ted Dunning (tdunning at apache dot org) - Chief Application >> Architect >> >> at MapR Technologies; committer for Lucene, Mahout and ZooKeeper. >> >> * Alan Gates (gates at apache dot org) - Architect at Hortonworks; >> >> committer for Pig, Hive and others. >> >> * Steven Noels (stevenn at apache dot org) - Chief Technical Officer >> at >> >> NGDATA; committer for Cocoon and Forrest, mentor for Phoenix. >> >> >> >> === Sponsoring Entity === >> >> >> >> The Apache Incubator. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> For additional commands, e-mail: general-h...@incubator.apache.org >> >> >