Have you looked at HAWQ from Pivotal ? Sent from my iPhone
> On Jan 30, 2015, at 4:27 AM, Samuel Marks <samuelma...@gmail.com> wrote: > > Since Hadoop came out, there have been various commercial and/or open-source > attempts to expose some compatibility with SQL. Obviously by posting here I > am not expecting an unbiased answer. > Seeking an SQL-on-Hadoop offering which provides: low-latency querying, and > supports the most common CRUD, including [the basics!] along these lines: > CREATE TABLE, INSERT INTO, SELECT * FROM, UPDATE Table SET C1=2 WHERE, DELETE > FROM, and DROP TABLE. Transactional support would be nice also, but is not a > must-have. > > Essentially I want a full replacement for the more traditional RDBMS, one > which can scale from 1 node to a serious Hadoop cluster. > > Python is my language of choice for interfacing, however there does seem to > be a Python JDBC wrapper. > > Here is what I've found thus far: > > Apache Hive (SQL-like, with interactive SQL thanks to the Stinger initiative) > Apache Drill (ANSI SQL support) > Apache Spark (Spark SQL, queries only, add data via Hive, RDD or Paraquet) > Apache Phoenix (built atop Apache HBase, lacks full transaction support, > relational operators and some built-in functions) > Cloudera Impala (significant HiveQL support, some SQL language support, no > support for indexes on its tables, importantly missing DELETE, UPDATE and > INTERSECT; amongst others) > Presto from Facebook (can query Hive, Cassandra, relational DBs &etc. Doesn't > seem to be designed for low-latency responses across small clusters, or > support UPDATE operations. It is optimized for data warehousing or analytics¹) > SQL-Hadoop via MapR community edition (seems to be a packaging of Hive, HP > Vertica, SparkSQL, Drill and a native ODBC wrapper) > Apache Kylin from Ebay (provides an SQL interface and multi-dimensional > analysis [OLAP], "… offers ANSI SQL on Hadoop and supports most ANSI SQL > query functions". It depends on HDFS, MapReduce, Hive and HBase; and seems > targeted at very large data-sets though maintains low query latency) > Apache Tajo (ANSI/ISO SQL standard compliance with JDBC driver support > [benchmarks against Hive and Impala]) > Cascading's Lingual² ("Lingual provides JDBC Drivers, a SQL command shell, > and a catalog manager for publishing files [or any resource] as schemas and > tables.") > Which—from this list or elsewhere—would you recommend, and why? > > Thanks for all suggestions, > > Samuel Marks > http://linkedin.com/in/samuelmarks