Re: Which [open-souce] SQL engine atop Hadoop?

Siddharth Tiwari Fri, 30 Jan 2015 04:36:36 -0800

Have you looked at HAWQ from Pivotal ?

Sent from my iPhone


> On Jan 30, 2015, at 4:27 AM, Samuel Marks <samuelma...@gmail.com> wrote:
> 
> Since Hadoop came out, there have been various commercial and/or open-source 
> attempts to expose some compatibility with SQL. Obviously by posting here I 
> am not expecting an unbiased answer.
> Seeking an SQL-on-Hadoop offering which provides: low-latency querying, and 
> supports the most common CRUD, including [the basics!] along these lines: 
> CREATE TABLE, INSERT INTO, SELECT * FROM, UPDATE Table SET C1=2 WHERE, DELETE 
> FROM, and DROP TABLE. Transactional support would be nice also, but is not a 
> must-have.
> 
> Essentially I want a full replacement for the more traditional RDBMS, one 
> which can scale from 1 node to a serious Hadoop cluster.
> 
> Python is my language of choice for interfacing, however there does seem to 
> be a Python JDBC wrapper.
> 
> Here is what I've found thus far:
> 
> Apache Hive (SQL-like, with interactive SQL thanks to the Stinger initiative)
> Apache Drill (ANSI SQL support)
> Apache Spark (Spark SQL, queries only, add data via Hive, RDD or Paraquet)
> Apache Phoenix (built atop Apache HBase, lacks full transaction support, 
> relational operators and some built-in functions)
> Cloudera Impala (significant HiveQL support, some SQL language support, no 
> support for indexes on its tables, importantly missing DELETE, UPDATE and 
> INTERSECT; amongst others)
> Presto from Facebook (can query Hive, Cassandra, relational DBs &etc. Doesn't 
> seem to be designed for low-latency responses across small clusters, or 
> support UPDATE operations. It is optimized for data warehousing or analytics¹)
> SQL-Hadoop via MapR community edition (seems to be a packaging of Hive, HP 
> Vertica, SparkSQL, Drill and a native ODBC wrapper)
> Apache Kylin from Ebay (provides an SQL interface and multi-dimensional 
> analysis [OLAP], "… offers ANSI SQL on Hadoop and supports most ANSI SQL 
> query functions". It depends on HDFS, MapReduce, Hive and HBase; and seems 
> targeted at very large data-sets though maintains low query latency)
> Apache Tajo (ANSI/ISO SQL standard compliance with JDBC driver support 
> [benchmarks against Hive and Impala])
> Cascading's Lingual² ("Lingual provides JDBC Drivers, a SQL command shell, 
> and a catalog manager for publishing files [or any resource] as schemas and 
> tables.")
> Which—from this list or elsewhere—would you recommend, and why?
> 
> Thanks for all suggestions,
> 
> Samuel Marks
> http://linkedin.com/in/samuelmarks

Re: Which [open-souce] SQL engine atop Hadoop?

Reply via email to