It is indeed very specific for HBase use I suppose. Would it be more beneficial to make it sub-project of HBase to get full community support from HBase?
On Wed, Nov 13, 2013 at 12:43 PM, James Taylor <jtay...@salesforce.com> wrote: > Hi All, > > We're pleased to share a draft ASF incubation proposal for Phoenix, a > SQL layer over HBase, initially developed at Salesforce.com and > subsequently open sourced on github > (https://github.com/forcedotcom/phoenix). Instead of using Map-reduce > to processes queries, it compiles SQL directly into native HBase > calls. The complete proposal can be found here: > https://wiki.apache.org/incubator/PhoenixProposal, and is also pasted > below. > > Your feedback is greatly appreciated. > > James > > == Abstract == > Phoenix is an open source SQL query engine for Apache HBase, a NoSQL > data store. It is accessed as a JDBC driver and enables querying and > managing HBase tables using SQL. > > == Proposal == > Phoenix is an open source SQL skin over HBase delivered as a > client-embedded JDBC driver targeting low latency queries over HBase > data. Phoenix takes your SQL query, compiles it into a series of HBase > scans, and orchestrates the running of those scans to produce regular > JDBC result sets. The table metadata is stored in an HBase table and > versioned, such that snapshot queries over prior versions will > automatically use the correct schema. Direct use of the HBase API, > along with coprocessors and custom filters, results in performance on > the order of milliseconds for small queries, or seconds for tens of > millions of rows. Phoenix interfaces with both Pig and Map-reduce for > the input and output of data. > > == Background == > Phoenix initially started as an internal project at Salesforce.com to > efficiently analyze big data stored in HBase. It was open sourced on > Github about a year ago in Jan 2013. Over time Phoenix, together with > HBase as the storage tier, has begun to evolve into a general SQL > database with support for metadata management, secondary indexes, > joins, query optimization, and multi-tenancy. This is expected to > continue as Phoenix implements a cost-based query optimizer and > potentially transaction support, and surfaces new HBase security > features such as encryption and cell-level security. Phoenix's > developer community has also grown to include additional companies > such as Intel, who have contributed join support to Phoenix, as well > as Hortonworks, who are in the process of porting Phoenix to the 0.96 > release of HBase. > > == Rationale == > As usage and the number of contributors to Phoenix has grown, we have > sought for a long-term home for the project, and we believe the Apache > foundation would be a great fit. Joining Apache would ensure that > tried and true processes and procedures are in place for the growing > number of organizations interested in contributing to Phoenix. Phoenix > is also a good fit for the Apache foundation: Phoenix already > interoperates with several existing Apache projects (HBase, Hadoop, > Pig). The Phoenix team is familiar with the Apache process and and > believes in the Apache mission - the team already includes multiple > Apache committers. > > == Initial Goals == > The initial goals will be to move the existing codebase to Apache and > integrate with the Apache development process. Once this is > accomplished, we plan for incremental development and releases that > follow the Apache guidelines. > > == Current Status == > Phoenix has undergone two major and three minor releases (1.0, 1.1, > 1.2, 2.0, and 2.1) as well as many patch releases. Phoenix is being > used in production by Salesforce.com as well as at other > organizations. The Phoenix codebase is currently hosted at github.com, > which will form the basis of the Apache git repository. > > === Meritocracy === > The Phoenix project already operates on meritocratic principles. > Phoenix has several developers from various organizations outside of > Salesforce.com who have contributed major new features. While this > process has remained mostly informal, as we do not have an official > committer list, an implicit organization exists in which individuals > who contribute major components act as maintainers for those modules. > If accepted, the Phoenix project would include several of these > participants as initial committers. We will work to identify all > committers and PPMC members for the project and to operate under the > ASF meritocratic principles. > > === Community === > Acceptance into the Apache foundation would bolster the already strong > user and developer community around Phoenix. That community includes > many contributors from various other companies, and an active mailing > list composed of hundreds of users. > > === Core Developers === > The core developers of our project are listed in our contributors and > initial PPMC below. Though many are employed at Salesforce.com, there > is a representative cross sampling of other organizations including > Intel, Hortonworks, Cloudera, and Twitter. > > === Alignment === > Our proposed Phoenix effort aligns closely with Apache HBase. The > HBase project perimeter is denoted by a simple byte-array based > Create, Read, Update, Delete and Scan APIs with no current plans to > extend beyond this bounds. Phoenix complements this with a higher > level API in SQL with which many are already familiar. At first > glance, it may seem that Phoenix should just be folded into HBase as a > new module. However, the focus of the two projects will be quite > different, especially as Phoenix matures. With secondary indexing and > joins just having been introduced into Phoenix, the next big frontier > will be to implement a cost-based query optimizer. This is the > heart-and-soul of most relational databases and can can take a > lifetime to get right. > > HBase is focused on being a scalable data store agnostic to types and > schema. Phoenix would layer typing, and relational facilities on top > of this scalable store. By keeping Apache HBase and Phoenix separate, > both may evolve independently and at different rates. Though the focus > of the two projects is different, the relationship between them is > very positive and mutually beneficial. New features in HBase will be > leveraged in Phoenix as it makes sense to surface these in a SQL > paradigm. In addition, Phoenix may drive new features in HBase, as > evidenced by the new type system recently introduced into HBase. This > will enable better interoperability between Apache Hive, standalone > HBase uses case, and Phoenix by defining a standard serialization > format. > > Other projects exists that perform SQL over HBase data (such as Apache > Hive), however these products do not provide the same low latency > query capabilities as Phoenix. Instead, they are more oriented around > maximizing throughput for batched operations. Phoenix opens the door > to a completely new set of use cases for Apache HBase that demand a > more interactive user experience. > > There are also a number of related Apache projects and dependencies > that are mentioned in the Relationships with Other Apache products > section. > > == Known Risks == > === Orphaned Products === > Given the current level of investment in Phoenix - the risk of the > project being abandoned is minimal. All current and planned HBase use > cases at Salesforce.com go through Phoenix. In addition, both Intel > and Hortonworks plan to include Phoenix in their distributions. Other > companies have devoted significant internal infrastructure investment > in Phoenix. > > === Inexperience with Open Source === > Phoenix has existed as a healthy open source project for almost a > year. During that time, James, Mujtaba, and others have successfully > fostered an open-source community, attracting users and developers > from a diverse group of companies including Intel, Intuit, Bloomberg, > Tagged, and Hortonworks. Although neither are committers on other > Apache projects, both James and Mujtaba have experience working with > and contributing to other Apache projects. > > === Homogenous Developers === > The initial list of committers includes developers from several > institutions, including Salesforce, Intel, Hortonworks, and Twitter. > > === Reliance on Salaried Developers === > Like most open source projects, Phoenix receives substantial support > from salaried developers. A large fraction of Phoenix development is > supported by Salesforce.com. In addition, those working from within > corporations and universities often devote “after hours” or spare time > to the project. We will continue our efforts to ensure stewardship of > the project to be independent of salaried developers. > > === Relationship with Other Apache Products === > Although Phoenix provides a higher level abstraction than Apache HBase > by hiding its client APIs, Phoenix relies on Apache HBase for both > storing and retrieving data. It also inter-operates with Apache HBase > by allowing existing data, not created by Phoenix, to be queried. In > addition, both Apache Pig and Hadoop are supported for data input and > output. Finally, the Phoenix is included and installable through > Apache Bigtop and the build and test suite are run through Apache > Maven. > > Phoenix offers an alternative query engine to Apache Hadoop > (MapReduce). Unlike MapReduce, Phoenix is designed for lower-latency, > OLTP, and interactive workloads. This makes the projects complimentary > as users may run MapReduce and Phoenix side-by-side. > > We plan to increase the interoperability between Phoenix, Apache Hive, > and standalone Apache HBase usage by standardizing on a new type > system that has been introduced in the current major release of HBase. > By all these products adopting this new serialization format, > interoperability between them will take a big step forward. > > In addition, we plan to explore providing lower level APIs for other > products such as Apache Drill to plug into when querying HBase data so > that they get the performance benefits of using Phoenix. > > === A Excessive Fascination with the Apache Brand === > Phoenix is already a healthy and relatively well known open source > project. This proposal is not for the purpose of generating publicity. > Rather, the primary benefits to joining Apache are those outlined in > the Rationale section. > > === Documentation === > Additional documentation on Phoenix may be found on its github website: > * Phoenix overview: > https://github.com/forcedotcom/phoenix/blob/master/README.md > * Phoenix wiki: https://github.com/forcedotcom/phoenix/wiki > * Phoenix road map: https://github.com/forcedotcom/phoenix/wiki#roadmap > * Phoenix issue tracking: > https://github.com/forcedotcom/phoenix/issues?direction=desc&sort=updated&state=open > * Phoenix codebase: https://github.com/forcedotcom/phoenix > * Phoenix SQL language reference: http://forcedotcom.github.io/phoenix/ > * Phoenix performance: > https://github.com/forcedotcom/phoenix/wiki/Performance#phoenix-vs-related-products > * User group: https://groups.google.com/group/phoenix-hbase-user > > == Initial Source == > The Phoenix codebase is currently hosted on Github: > https://github.com/forcedotcom/phoenix. > > === Source and Intellectual Property Submission Plan === > Currently, the Phoenix codebase is distributed under a BSD license. > Upon entering Apache, the Phoenix license will be migrated to the > Apache 2.0 License. > > == External Dependencies == > Beyond relying on Apache HBase, Phoenix has the following external > dependencies: > * ANTLR 3.5 (BSD license: http://www.antlr3.org/license.html) > * Sqlline 1.1.2 (BSD license: > https://github.com/julianhyde/sqlline/blob/master/LICENSE) > * Open CSV 2.3 (Apache 2.0 license) > > Upon acceptance to the incubator, we would begin a thorough analysis > of all transitive dependencies to verify this information and > introduce license checking into the build and release process by > integrating with Apache Rat. > > == Required Resources == > === Mailing list === > We will migrate the existing Phoenix mailing lists as follows: > > * phoenix-hbase-u...@googlegroups.com --> us...@phoenix.incubator.apache.org > * phoenix-hbase-...@googlegroups.com --> d...@phoenix.incubator.apache.org > * priv...@phoenix.incubator.apache.org for IPMC members > * comm...@phoenix.incubator.apache.org > > The latter is to be consistent with the new PIAO naming scheme for podlings. > > === Source control === > The Phoenix team would like to use Git for source control, due to our > current use of Git. > We request a writeable Git repo for Phoenix, and mirroring to be set > up to Github through INFRA. > > === Issue Tracking === > Phoenix currently uses the github issue tracking system associated > with its github repo: > https://github.com/forcedotcom/phoenix/issues?direction=desc&sort=updated&state=open. > We will migrate to the Apache JIRA: > http://issues.apache.org/jira/browse/PHOENIX > > === Other Resources === > * Jenkins/Hudson for builds and test running. > * Wiki for documentation purposes > * Blog to improve project dissemination > > == Initial Committers == > * James Taylor <jtaylor at salesforce dot com> > * Mujtaba Chohan <mchohan at salesforce dot com> > * Jesse Yates <jyates at apache dot org> > * Eli Levine <elevine at salesforce dot com> > * Simon Toens <stoens at salesforce dot com> > * Maryann Xue <wei.xue at intel dot com> > * Anoop Sam John <anoopsamjohn at apache dot org> > * Ramkrishna S Vasudevan <ramkrishna at apache dot org> > * Jeffrey Zhong <jeffreyz at apache dot org> > * Nick Dimiduk <ndimiduk at apache dot org> > * Tony Huang <thuang at twitter dot com> > > == Affiliations == > The initial committers are from four organizations: Salesforce.com, > Intel, Hortonworks, and Twitter. > > * James Taylor (Salesforce.com) > * Mujtaba Chohan (Salesforce.com) > * Jesse Yates (Salesforce.com) > * Eli Levine (Salesforce.com) > * Simon Toens (Salesforce.com) > * Maryann Xue (Intel) > * Anoop Sam John (Intel) > * Ramkrishna S Vasudevan (Intel) > * Jeffrey Zhong (Hortonworks) > * Nick Dimiduk (Hortonworks) > * Tony Huang (Twitter) > > == Sponsors == > === Champion === > * Michael Stack > > === Nominated Mentors === > * Michael Stack > * Lars Hofhansl > * Andrew Purtell > * Devaraj Das > * Enis Soztutar > > === Sponsoring Entity === > The Apache Incubator > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org