+1 (non-binding)

2014-05-12 17:11 GMT+02:00 Alan Gates <ga...@hortonworks.com>:

> +1.
>
> Alan.
>
> On May 9, 2014, at 11:03 AM, Ashutosh Chauhan <hashut...@apache.org>
> wrote:
>
> > Based on the results of the discussion thread (
> >
> http://mail-archives.apache.org/mod_mbox/incubator-general/201404.mbox/%3CCA%2BFBdFQA4TghLRdh9GgDKaMtKLQHxE_QZV%3DoZ7HfiDSA_jyqwg%40mail.gmail.com%3E
> > ),  I would like to call a vote on accepting Optiq into the incubator.
> >
> > [ ] +1 Accept Optiq into the Incubator
> > [ ] +0 Indifferent to the acceptance of Stratosphere
> > [ ] -1 Do not accept Optiq because ...
> >
> > The vote will be open until Tuesday May 13 18:00 UTC.
> >
> > https://wiki.apache.org/incubator/OptiqProposal
> >
> > = Optiq =
> > == Abstract ==
> >
> > Optiq is a framework that allows efficient translation of queries
> involving
> > heterogeneous and federated data.
> >
> > == Proposal ==
> >
> > Optiq is a highly customizable engine for parsing and planning queries on
> > data in a wide variety of formats. It allows database-like access, and in
> > particular a SQL interface and advanced query optimization, for data not
> > residing in a traditional database.
> >
> > == Background ==
> >
> > Databases were traditionally engineered in a monolithic stack, providing
> a
> > data storage format, data processing algorithms, query parser, query
> > planner, built-in functions, metadata repository and connectivity layer.
> > They innovate in some areas but rarely in all.
> >
> > Modern data management systems are decomposing that stack into separate
> > components, separating data, processing engine, metadata, and query
> > language support. They are highly heterogeneous, with data in multiple
> > locations and formats, caching and redundant data, different workloads,
> and
> > processing occurring in different engines.
> >
> > Query planning (sometimes called query optimization) has always been a
> key
> > function of a DBMS, because it allows the implementors to introduce new
> > query-processing algorithms, and allows data administrators to
> re-organize
> > the data without affecting applications built on that data. In a
> > componentized system, the query planner integrates the components (data
> > formats, engines, algorithms) without introducing unncessary coupling or
> > performance tradeoffs.
> >
> > But building a query planner is hard; many systems muddle along without a
> > planner, and indeed a SQL interface, until the demand from their
> customers
> > is overwhelming.
> >
> > There is an opportunity to make this process more efficient by creating a
> > re-usable framework.
> >
> > == Rationale ==
> >
> > Optiq allows database-like access, and in particular a SQL interface and
> > advanced query optimization, for data not residing in a traditional
> > database. It is complementary to many current Hadoop and NoSQL systems,
> > which have innovative and performant storage and runtime systems but
> lack a
> > SQL interface and intelligent query translation.
> >
> > Optiq is already in use by several projects, including Apache Drill,
> Apache
> > Hive and Cascading Lingual, and commercial products.
> >
> > Optiq's architecture consists of:
> >
> > An extensible relational algebra.
> > * SPIs (service-provider interfaces) for metadata (schemas and tables),
> > planner rules, statistics, cost-estimates, user-defined functions.
> > * Built-in sets of rules for logical transformations and common
> > data-sources.
> > * Two query planning engines driven by rules, statistics, etc. One engine
> > is cost-based, the other rule-based.
> > * Optional SQL parser, validator and translator to relational algebra.
> > * Optional JDBC driver.
> >
> > == Initial Goals ==
> >
> > The initial goals are be to move the existing codebase to Apache and
> > integrate with the Apache development process. Once this is accomplished,
> > we plan for incremental development and releases that follow the Apache
> > guidelines.
> >
> > As we move the code into the org.apache namespace, we will restructure
> > components as necessary to allow clients to use just the components of
> > Optiq that they need.
> >
> > A version 1.0 release, including pre-built binaries, will foster wider
> > adoption.
> >
> > == Current Status ==
> >
> > Optiq has had over a dozen minor releases over the last 18 months. Its
> core
> > SQL parser and validator, and its planning engine and core rules, are
> > mature and robust and are the basis for several production systems; but
> > other components and SPIs are still undergoing rapid evolution.
> >
> > === Meritocracy ===
> >
> > We plan to invest in supporting a meritocracy. We will discuss the
> > requirements in an open forum. We encourage the companies and projects
> > using Optiq to discuss their requirements in an open forum and to
> > participate in development. We will encourage and monitor community
> > participation so that privileges can be extended to those that
> contribute.
> >
> > Optiq's pluggable architecture encourages developers to contribute
> > extensions such as adapters for data sources, new planning rules, and
> > better statistics and cost-estimation functions. We look forward to
> > fostering a rich ecosystem of extensions.
> >
> > === Community ===
> >
> > Building a data management system requires a high degree of technical
> > skill, and correspondingly, the community of developers directly using
> > Optiq is potentially fairly small, albeit highly technical and engaged.
> But
> > we also expect engagement from members of the communities of projects
> that
> > use Optiq, such as Drill and Hive. And we intend to structure Optiq so
> that
> > it can be used for lighter weight applications, such as providing a SQL
> and
> > JDBC interface to a NoSQL system.
> >
> > === Core Developers ===
> >
> > The developers on the initial committers list are all experienced open
> > source developers, and are actively using Optiq in their projects.
> >
> > * Julian Hyde is lead developer of Mondrian, an open source OLAP engine,
> > and an Apache Drill committer.
> > * Chris Wensel is lead developer of Cascading, and of Lingual, the SQL
> > interface to Cascading built using Optiq.
> > * Jacques Nadeau is lead developer of Apache Drill, which uses Optiq.
> >
> > In addition, there are several regular contributors whom we hope will
> > graduate to committers during the incubation process.
> >
> > We realize that additional employer diversity is needed, and we will work
> > aggressively to recruit developers from additional companies.
> >
> > === Alignment ===
> >
> > Apache, and in particular the ecosystem surrounding Hadoop, contains
> > several projects for building data management systems that leverage each
> > other's capabilities. Optiq is a natural fit for that ecosystem, and will
> > help foster projects meeting new challenges.
> >
> > Optiq is already used by Apache Hive and Apache Drill; Optiq embeds
> Apache
> > Spark as an optional engine; we are in discussion with Apache Phoenix
> about
> > integrating JDBC and query planning.
> >
> > == Known Risks ==
> >
> > === Orphaned Products ===
> >
> > Optiq is already a key component in three independent projects, each
> backed
> > by a different company, so the risk of being orphaned is relatively low.
> We
> > plan to mitigate this risk by recruiting additional committers, and
> > promoting Optiq's adoption as a framework by other projects.
> >
> > === Inexperience with Open Source ===
> >
> > The initial committers are all Apache members, some of whom have several
> > years in the Apache Hadoop community. The founder of the project, Julian
> > Hyde, has been a founder and key developer in open source projects for
> over
> > ten years.
> >
> > === Homogenous Developers ===
> >
> > The initial committers are employed by a number of companies, including
> > Concurrent, Hortonworks, MapR Technologies and Salesforce.com. We are
> > committed to recruiting additional committers from outside these
> companies.
> >
> > === Reliance on Salaried Developers ===
> >
> > Like most open source projects, Optiq receives substantial support from
> > salaried developers. This is to be expected given that it is a highly
> > technical framework. However, they are all passionate about the project,
> > and we are confident that the project will continue even if no salaried
> > developers contribute to the project. As a framework, the project
> > encourages the involvement of members of other projects, and of academic
> > researchers. We are committed to recruiting additional committers
> including
> > non-salaried developers.
> >
> > === Relationships with Other Apache Products ===
> >
> > As mentioned in the Alignment section, Optiq is being used by Apache Hive
> > and Apache Drill, and has adapters for Apache Phoenix and Apache Spark.
> > Optiq often operates on data in a Hadoop environment, so collaboration
> with
> > other Hadoop projects is desirable and highly likely.
> >
> > Unsurprisingly there is some overlap in capabilities between Optiq and
> > other Apache projects. Several projects that are databases or
> database-like
> > have query-planning capabilities. These include Hive, Drill, Phoenix,
> > Spark, Apache Derby, Apache Pig, Apache Jena and Apache Tajo. Optiq's
> query
> > planner is extensible at run time, and does not have a preferred runtime
> > engine on which to execute compiled queries. These capabilities, and the
> > large corpus of pre-built rules, are what allow Optiq to be embedded in
> > other projects.
> >
> > Several other Apache projects access third-party data sources, including
> > Hive, Pig, Drill, Spark and Apache MetaModel. Optiq allows users to
> > optimize access to third-party data sources by writing rules to push
> > processing down to the data source, and provide a cost model to choose
> the
> > optimal location for processing. That said, maintaining a library of
> > adapters is burdensome, and so it would make sense to collaborate with
> > other projects on adapter libraries, and re-use libraries where possible.
> >
> > Optiq supports several front ends for submitting queries. The most
> popular
> > is SQL, with driver connectivity via JDBC (and ODBC planned). Other
> Apache
> > projects with a SQL parser include Hive, Spark, Phoenix, Derby, Tajo.
> Drill
> > uses Optiq's parser and JDBC stack; both Phoenix and Drill have expressed
> > interest in collaborating on JDBC and ODBC. Optiq's Linq4j API is similar
> > to the fluent query-builder APIs in Spark and MetaModel. Use of a front
> end
> > is not required; for instance, Hive integrates with Optiq by directly
> > building a graph of RelNode objects.
> >
> > === An Excessive Fascination with the Apache Brand ===
> >
> > Optiq solves a real problem, as evidenced by its take-up by other
> projects.
> > This proposal is not for the purpose of generating publicity. Rather, the
> > primary benefits to joining Apache are those outlined in the Rationale
> > section.
> >
> > == Documentation ==
> >
> > Additional documentation for Optiq may be found on its github site:
> >
> > * [[https://github.com/julianhyde/optiq/blob/master/README.md|Overview]]
> > * [[
> > https://github.com/julianhyde/optiq-csv/blob/master/TUTORIAL.md|Tutorial
> ]]
> > * [[https://github.com/julianhyde/optiq/blob/master/HOWTO.md|HOWTO]]
> > * [[
> https://github.com/julianhyde/optiq/blob/master/REFERENCE.md|Referenceguide
> ]]
> >
> > ==== Presentation ====
> >
> > *[[
> >
> https://github.com/julianhyde/share/blob/master/slides/optiq-richrelevance-2013.pdf?raw=true|
> > SQL on Big Data using Optiq]]
> > == Initial Source ==
> >
> > The initial code codebase resides in three projects, all hosted on
> github:
> >
> > * https://github.com/julianhyde/optiq
> > * https://github.com/julianhyde/optiq-csv
> > * https://github.com/julianhyde/linq4j
> >
> > === Source and Intellectual Property Submission Plan ===
> >
> > The initial codebase is already distributed under the Apache 2.0 License.
> > The owners of the IP have indicated willingness to sign the SGA.
> >
> > === External Dependencies ===
> >
> > Optiq and Linq4j have the following external dependencies.
> >
> > * Java 1.6, 1.7 or 1.8
> > * Apache Maven, Commons
> > * JavaCC (BSD license)
> > * Sqlline 1.1.6 (BSD license)
> > * Junit 4.11 (EPL)
> > * Janino (BSD license)
> > * Guava (Apache 2.0 license)
> > * Eigenbase-resgen, eigenbase-xom, eigenbase-properties (Apache 2.0
> > license)
> >
> > Some of Optiq's adapters (optiq-csv, optiq-mongodb, optiq-spark,
> > optiq-splunk) are currently developed alongside core Optiq, and have the
> > following additional dependencies:
> >
> > * Open CSV 2.3 (Apache 2.0 license)
> > * Apache Incubator Spark
> > * Mongo Java driver (Apache 2.0 license)
> > Upon acceptance to the incubator, we would begin a thorough analysis of
> all
> > transitive dependencies to verify this information and introduce license
> > checking into the build and release process by integrating with Apache
> Rat.
> >
> > === Cryptography ===
> >
> > Optiq will eventually support encryption on the wire. This is not one of
> > the initial goals, and we do not expect Optiq to be a controlled export
> > item due to the use of encryption.
> >
> > == Required Resources ==
> >
> > === Mailing Lists ===
> >
> > * priv...@optiq.incubator.apache.org
> > * d...@optiq.incubator.apache.org (will be migrated from
> > optiq-...@googlegroups.com)
> > * comm...@optiq.incubator.apache.org
> >
> > === Source control ===
> >
> > The Optiq team would like to use git for source control, due to our
> current
> > use of git/github. We request a writeable git repo git://
> > git.apache.org/incubator-optiq, and mirroring to be set up to github
> > through INFRA.
> >
> > === Issue Tracking ===
> >
> > Optiq currently uses the github issue tracking system associated with its
> > github repo: https://github.com/julianhyde/optiq/issues. We will
> migrate to
> > the Apache JIRA: http://issues.apache.org/jira/browse/OPTIQ.
> >
> > == Initial Committers ==
> >
> > * Julian Hyde (jhyde at apache dot org)
> > * Jacques Nadeau (jacques at apache dot org)
> > * James R. Taylor (jamestaylor at apache dot org)
> > * Chris Wensel (cwensel at apache dot org)
> >
> > === Affiliations ===
> >
> > The initial committers are employees of Concurrent, Hortonworks, MapR and
> > Salesforce.com.
> >
> > * Julian Hyde (Hortonworks)
> > * Jacques Nadeau (MapR Technologies)
> > * James R. Taylor (Salesforce.com)
> > * Chris Wensel (Concurrent)
> >
> > == Sponsors ==
> >
> > === Champion ===
> >
> > * Ashutosh Chauhan (hashutosh at apache dot org)
> >
> > === Nominated Mentors ===
> >
> > * Ted Dunning (tdunning at apache dot org) - Chief Application Architect
> > at MapR Technologies; committer for Lucene, Mahout and ZooKeeper.
> > * Alan Gates (gates at apache dot org) - Architect at Hortonworks;
> > committer for Pig, Hive and others.
> > * Steven Noels (stevenn at apache dot org) - Chief Technical Officer at
> > NGDATA; committer for Cocoon and Forrest, mentor for Phoenix.
> >
> > === Sponsoring Entity ===
> >
> > The Apache Incubator.
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Reply via email to