Re: Spark SQL Thriftserver with HBase

Benjamin Kim Sat, 08 Oct 2016 11:00:48 -0700

Felix,

My goal is to use Spark SQL JDBC Thriftserver to access HBase tables using just 
SQL. I have been able to CREATE tables using this statement below in the past:


CREATE TABLE <table-name>
USING org.apache.spark.sql.jdbc
OPTIONS (
  url 
"jdbc:postgresql://<hostname>:<port>/dm?user=<username>&password=<password>",
  dbtable "dim.dimension_acamp"
);

After doing this, I can access the PostgreSQL table using Spark SQL JDBC 
Thriftserver using SQL statements (SELECT, UPDATE, INSERT, etc.). I want to do 
the same with HBase tables. We tried this using Hive and HiveServer2, but the 
response times are just too long.

Thanks,
Ben


> On Oct 8, 2016, at 10:53 AM, Felix Cheung <felixcheun...@hotmail.com> wrote:
> 
> Ben,
> 
> I'm not sure I'm following completely.
> 
> Is your goal to use Spark to create or access tables in HBASE? If so the link 
> below and several packages out there support that by having a HBASE data 
> source for Spark. There are some examples on how the Spark code look like in 
> that link as well. On that note, you should also be able to use the HBASE 
> data source from pure SQL (Spark SQL) query as well, which should work in the 
> case with the Spark SQL JDBC Thrift Server (with USING, 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#tab_sql_10 
> <http://spark.apache.org/docs/latest/sql-programming-guide.html#tab_sql_10>).
> 
> 
> _____________________________
> From: Benjamin Kim <bbuil...@gmail.com <mailto:bbuil...@gmail.com>>
> Sent: Saturday, October 8, 2016 10:40 AM
> Subject: Re: Spark SQL Thriftserver with HBase
> To: Felix Cheung <felixcheun...@hotmail.com 
> <mailto:felixcheun...@hotmail.com>>
> Cc: <user@spark.apache.org <mailto:user@spark.apache.org>>
> 
> 
> Felix,
> 
> The only alternative way is to create a stored procedure (udf) in database 
> terms that would run Spark scala code underneath. In this way, I can use 
> Spark SQL JDBC Thriftserver to execute it using SQL code passing the key, 
> values I want to UPSERT. I wonder if this is possible since I cannot CREATE a 
> wrapper table on top of a HBase table in Spark SQL?
> 
> What do you think? Is this the right approach?
> 
> Thanks,
> Ben
> 
> On Oct 8, 2016, at 10:33 AM, Felix Cheung <felixcheun...@hotmail.com 
> <mailto:felixcheun...@hotmail.com>> wrote:
> 
> HBase has released support for Spark
> hbase.apache.org/book.html#spark <http://hbase.apache.org/book.html#spark>
> 
> And if you search you should find several alternative approaches.
> 
> 
> 
> 
> 
> On Fri, Oct 7, 2016 at 7:56 AM -0700, "Benjamin Kim" <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> 
> Does anyone know if Spark can work with HBase tables using Spark SQL? I know 
> in Hive we are able to create tables on top of an underlying HBase table that 
> can be accessed using MapReduce jobs. Can the same be done using HiveContext 
> or SQLContext? We are trying to setup a way to GET and POST data to and from 
> the HBase table using the Spark SQL JDBC thriftserver from our RESTful API 
> endpoints and/or HTTP web farms. If we can get this to work, then we can load 
> balance the thriftservers. In addition, this will benefit us in giving us a 
> way to abstract the data storage layer away from the presentation layer code. 
> There is a chance that we will swap out the data storage technology in the 
> future. We are currently experimenting with Kudu.
> 
> Thanks,
> Ben
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
> <mailto:user-unsubscr...@spark.apache.org>
> 
>

Re: Spark SQL Thriftserver with HBase

Reply via email to