+1 (non-binding) for incubation of Phoenix. This is already a very useful 
project for HBase users, and it will be good to see it driven by a larger 
community.

It may be also be good to get a legal opinion about the name, and/or check for 
an alternative. There may be some existing software products with this name 
(e.g. http://www.opwglobal.com/Product.aspx?pid=342).

Regards
Priyank Rastogi

-----Original Message-----
From: James Taylor [mailto:jtay...@salesforce.com] 
Sent: 14 November 2013 02:14
To: general@incubator.apache.org
Subject: [PROPOSAL] Phoenix for Incubation

Hi All,

We're pleased to share a draft ASF incubation proposal for Phoenix, a SQL layer 
over HBase, initially developed at Salesforce.com and subsequently open sourced 
on github (https://github.com/forcedotcom/phoenix). Instead of using Map-reduce 
to processes queries, it compiles SQL directly into native HBase calls. The 
complete proposal can be found here:
https://wiki.apache.org/incubator/PhoenixProposal, and is also pasted below.

Your feedback is greatly appreciated.

James

== Abstract ==
Phoenix is an open source SQL query engine for Apache HBase, a NoSQL data 
store.  It is accessed as a JDBC driver and enables querying and managing HBase 
tables using SQL.

== Proposal ==
Phoenix is an open source SQL skin over HBase delivered as a client-embedded 
JDBC driver targeting low latency queries over HBase data. Phoenix takes your 
SQL query, compiles it into a series of HBase scans, and orchestrates the 
running of those scans to produce regular JDBC result sets. The table metadata 
is stored in an HBase table and versioned, such that snapshot queries over 
prior versions will automatically use the correct schema. Direct use of the 
HBase API, along with coprocessors and custom filters, results in performance 
on the order of milliseconds for small queries, or seconds for tens of millions 
of rows. Phoenix interfaces with both Pig and Map-reduce for the input and 
output of data.

== Background ==
Phoenix initially started as an internal project at Salesforce.com to 
efficiently analyze big data stored in HBase. It was open sourced on Github 
about a year ago in Jan 2013. Over time Phoenix, together with HBase as the 
storage tier, has begun to evolve into a general SQL database with support for 
metadata management, secondary indexes, joins, query optimization, and 
multi-tenancy. This is expected to continue as Phoenix implements a cost-based 
query optimizer and potentially transaction support, and surfaces new HBase 
security features such as encryption and cell-level security. Phoenix's 
developer community has also grown to include additional companies such as 
Intel, who have contributed join support to Phoenix, as well as Hortonworks, 
who are in the process of porting Phoenix to the 0.96 release of HBase.

== Rationale ==
As usage and the number of contributors to Phoenix has grown, we have sought 
for a long-term home for the project, and we believe the Apache foundation 
would be a great fit. Joining Apache would ensure that tried and true processes 
and procedures are in place for the growing number of organizations interested 
in contributing to Phoenix. Phoenix is also a good fit for the Apache 
foundation: Phoenix already interoperates with several existing Apache projects 
(HBase, Hadoop, Pig). The Phoenix team is familiar with the Apache process and 
and believes in the Apache mission - the team already includes multiple Apache 
committers.

== Initial Goals ==
The initial goals will be to move the existing codebase to Apache and integrate 
with the Apache development process. Once this is accomplished, we plan for 
incremental development and releases that follow the Apache guidelines.

== Current Status ==
Phoenix has undergone two major and three minor releases (1.0, 1.1, 1.2, 2.0, 
and 2.1) as well as many patch releases. Phoenix is being used in production by 
Salesforce.com as well as at other organizations. The Phoenix codebase is 
currently hosted at github.com, which will form the basis of the Apache git 
repository.

=== Meritocracy ===
The Phoenix project already operates on meritocratic principles.
Phoenix has several developers from various organizations outside of 
Salesforce.com who have contributed major new features. While this process has 
remained mostly informal, as we do not have an official committer list, an 
implicit organization exists in which individuals who contribute major 
components act as maintainers for those modules.
If accepted, the Phoenix project would include several of these participants as 
initial committers. We will work to identify all committers and PPMC members 
for the project and to operate under the ASF meritocratic principles.

=== Community ===
Acceptance into the Apache foundation would bolster the already strong user and 
developer community around Phoenix. That community includes many contributors 
from various other companies, and an active mailing list composed of hundreds 
of users.

=== Core Developers ===
The core developers of our project are listed in our contributors and initial 
PPMC below. Though many are employed at Salesforce.com, there is a 
representative cross sampling of other organizations including Intel, 
Hortonworks, Cloudera, and Twitter.

=== Alignment ===
Our proposed Phoenix effort aligns closely with Apache HBase. The HBase project 
perimeter is denoted by a simple byte-array based Create, Read, Update, Delete 
and Scan APIs with no current plans to extend beyond this bounds. Phoenix 
complements this with a higher level API in SQL with which many are already 
familiar. At first glance, it may seem that Phoenix should just be folded into 
HBase as a new module. However, the focus of the two projects will be quite 
different, especially as Phoenix matures. With secondary indexing and joins 
just having been introduced into Phoenix, the next big frontier will be to 
implement a cost-based query optimizer. This is the heart-and-soul of most 
relational databases and can can take a lifetime to get right.

HBase is focused on being a scalable data store agnostic to types and schema.  
Phoenix would layer typing, and relational facilities on top of this scalable 
store. By keeping Apache HBase and Phoenix separate, both may evolve 
independently and at different rates. Though the focus of the two projects is 
different, the relationship between them is very positive and mutually 
beneficial. New features in HBase will be leveraged in Phoenix as it makes 
sense to surface these in a SQL paradigm. In addition, Phoenix may drive new 
features in HBase, as evidenced by the new type system recently introduced into 
HBase. This will enable better interoperability between Apache Hive, standalone 
HBase uses case, and Phoenix by defining a standard serialization format.

Other projects exists that perform SQL over HBase data (such as Apache Hive), 
however these products do not provide the same low latency query capabilities 
as Phoenix. Instead, they are more oriented around maximizing throughput for 
batched operations. Phoenix opens the door to a completely new set of use cases 
for Apache HBase that demand a more interactive user experience.

There are also a number of related Apache projects and dependencies that are 
mentioned in the Relationships with Other Apache products section.

== Known Risks ==
=== Orphaned Products ===
Given the current level of investment in Phoenix - the risk of the project 
being abandoned is minimal. All current and planned HBase use cases at 
Salesforce.com go through Phoenix. In addition, both Intel and Hortonworks plan 
to include Phoenix in their distributions. Other companies have devoted 
significant internal infrastructure investment in Phoenix.

=== Inexperience with Open Source ===
Phoenix has existed as a healthy open source project for almost a year. During 
that time, James, Mujtaba, and others have successfully fostered an open-source 
community, attracting users and developers from a diverse group of companies 
including Intel, Intuit, Bloomberg, Tagged, and Hortonworks. Although neither 
are committers on other Apache projects, both James and Mujtaba have experience 
working with and contributing to other Apache projects.

=== Homogenous Developers ===
The initial list of committers includes developers from several institutions, 
including Salesforce, Intel, Hortonworks, and Twitter.

=== Reliance on Salaried Developers ===
Like most open source projects, Phoenix receives substantial support from 
salaried developers. A large fraction of Phoenix development is supported by 
Salesforce.com. In addition, those working from within corporations and 
universities often devote "after hours" or spare time to the project. We will 
continue our efforts to ensure stewardship of the project to be independent of 
salaried developers.

=== Relationship with Other Apache Products === Although Phoenix provides a 
higher level abstraction than Apache HBase by hiding its client APIs, Phoenix 
relies on Apache HBase for both storing and retrieving data. It also 
inter-operates with Apache HBase by allowing existing data, not created by 
Phoenix, to be queried. In addition, both Apache Pig and Hadoop are supported 
for data input and output. Finally, the Phoenix is included and installable 
through Apache Bigtop and the build and test suite are run through Apache Maven.

Phoenix offers an alternative query engine to Apache Hadoop (MapReduce). Unlike 
MapReduce, Phoenix is designed for lower-latency, OLTP, and interactive 
workloads. This makes the projects complimentary as users may run MapReduce and 
Phoenix side-by-side.

We plan to increase the interoperability between Phoenix, Apache Hive, and 
standalone Apache HBase usage by standardizing on a new type system that has 
been introduced in the current major release of HBase.
By all these products adopting this new serialization format, interoperability 
between them will take a big step forward.

In addition, we plan to explore providing lower level APIs for other products 
such as Apache Drill to plug into when querying HBase data so that they get the 
performance benefits of using Phoenix.

=== A Excessive Fascination with the Apache Brand === Phoenix is already a 
healthy and relatively well known open source project. This proposal is not for 
the purpose of generating publicity.
Rather, the primary benefits to joining Apache are those outlined in the 
Rationale section.

=== Documentation ===
Additional documentation on Phoenix may be found on its github website:
 * Phoenix overview:
https://github.com/forcedotcom/phoenix/blob/master/README.md
 * Phoenix wiki: https://github.com/forcedotcom/phoenix/wiki
 * Phoenix road map: https://github.com/forcedotcom/phoenix/wiki#roadmap
 * Phoenix issue tracking:
https://github.com/forcedotcom/phoenix/issues?direction=desc&sort=updated&state=open
 * Phoenix codebase: https://github.com/forcedotcom/phoenix
 * Phoenix SQL language reference: http://forcedotcom.github.io/phoenix/
 * Phoenix performance:
https://github.com/forcedotcom/phoenix/wiki/Performance#phoenix-vs-related-products
 * User group: https://groups.google.com/group/phoenix-hbase-user

== Initial Source ==
The Phoenix codebase is currently hosted on Github:
https://github.com/forcedotcom/phoenix.

=== Source and Intellectual Property Submission Plan === Currently, the Phoenix 
codebase is distributed under a BSD license.
Upon entering Apache, the Phoenix license will be migrated to the Apache 2.0 
License.

== External Dependencies ==
Beyond relying on Apache HBase, Phoenix has the following external dependencies:
 * ANTLR 3.5 (BSD license: http://www.antlr3.org/license.html)
 * Sqlline 1.1.2 (BSD license:
https://github.com/julianhyde/sqlline/blob/master/LICENSE)
 * Open CSV 2.3 (Apache 2.0 license)

Upon acceptance to the incubator, we would begin a thorough analysis of all 
transitive dependencies to verify this information and introduce license 
checking into the build and release process by integrating with Apache Rat.

== Required Resources ==
=== Mailing list ===
We will migrate the existing Phoenix mailing lists as follows:

 * phoenix-hbase-u...@googlegroups.com --> us...@phoenix.incubator.apache.org
 * phoenix-hbase-...@googlegroups.com --> d...@phoenix.incubator.apache.org
 * priv...@phoenix.incubator.apache.org for IPMC members
 * comm...@phoenix.incubator.apache.org

The latter is to be consistent with the new PIAO naming scheme for podlings.

=== Source control ===
The Phoenix team would like to use Git for source control, due to our current 
use of Git.
We request a writeable Git repo for Phoenix, and mirroring to be set up to 
Github through INFRA.

=== Issue Tracking ===
Phoenix currently uses the github issue tracking system associated with its 
github repo:
https://github.com/forcedotcom/phoenix/issues?direction=desc&sort=updated&state=open.
We will migrate to the Apache JIRA:
http://issues.apache.org/jira/browse/PHOENIX

=== Other Resources ===
 * Jenkins/Hudson for builds and test running.
 * Wiki for documentation purposes
 * Blog to improve project dissemination

== Initial Committers ==
 * James Taylor <jtaylor at salesforce dot com>
 * Mujtaba Chohan <mchohan at salesforce dot com>
 * Jesse Yates <jyates at apache dot org>
 * Eli Levine <elevine at salesforce dot com>
 * Simon Toens <stoens at salesforce dot com>
 * Maryann Xue <wei.xue at intel dot com>
 * Anoop Sam John <anoopsamjohn at apache dot org>
 * Ramkrishna S Vasudevan <ramkrishna at apache dot org>
 * Jeffrey Zhong <jeffreyz at apache dot org>
 * Nick Dimiduk <ndimiduk at apache dot org>
 * Tony Huang <thuang at twitter dot com>

== Affiliations ==
The initial committers are from four organizations: Salesforce.com, Intel, 
Hortonworks, and Twitter.

 * James Taylor (Salesforce.com)
 * Mujtaba Chohan (Salesforce.com)
 * Jesse Yates (Salesforce.com)
 * Eli Levine (Salesforce.com)
 * Simon Toens (Salesforce.com)
 * Maryann Xue (Intel)
 * Anoop Sam John (Intel)
 * Ramkrishna S Vasudevan (Intel)
 * Jeffrey Zhong (Hortonworks)
 * Nick Dimiduk (Hortonworks)
 * Tony Huang (Twitter)

== Sponsors ==
=== Champion ===
 * Michael Stack

=== Nominated Mentors ===
 * Michael Stack
 * Lars Hofhansl
 * Andrew Purtell
 * Devaraj Das
 * Enis Soztutar

=== Sponsoring Entity ===
 The Apache Incubator

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Reply via email to