In the upcoming 1.4.0 release, SPARK-3468 should give you better clue.

Cheers

On Fri, May 1, 2015 at 12:30 PM, Siddharth Ubale <
siddharth.ub...@syncoms.com> wrote:

>  Hi,
>
>
>  Thanks for the reply.
>
>
>  Hbase cli takes less than 500 ms for the same query.
>
> I am running a simple query i.t "Select * from Customers where
> c_id='123123'".
>
> Why would the same query which takes 500 ms at Hbase cli end up taking
> around 8 secs via Spark-Sql?
>
> I am unable t understand this.
>
>
>  Thanks,
>
> Siddharth
>
>
>
>
>
>  ------------------------------
> *From:* ayan guha <guha.a...@gmail.com>
> *Sent:* 01 May 2015 04:38
> *To:* Ted Yu
> *Cc:* user@spark.apache.org; Siddharth Ubale; matei.zaha...@gmail.com;
> Prakash Hosalli; Amit Kumar
> *Subject:* Re: real time Query engine Spark-SQL on Hbase
>
>
> And if I may ask, how long it takes in hbase CLI? I would not expect spark
> to  improve performance of hbase. At best spark will push down the filter
> to hbase. So I would try to optimise any additional overhead like bringing
> data into spark.
> On 1 May 2015 00:56, "Ted Yu" <yuzhih...@gmail.com> wrote:
>
>> bq. a single query on one filter criteria
>>
>>  Can you tell us more about your filter ? How selective is it ?
>>
>>  Which hbase release are you using ?
>>
>>  Cheers
>>
>> On Thu, Apr 30, 2015 at 7:23 AM, Siddharth Ubale <
>> siddharth.ub...@syncoms.com> wrote:
>>
>>>  Hi,
>>>
>>>
>>>
>>> I want to use Spark as Query engine on HBase with sub second latency.
>>>
>>>
>>>
>>> I am  using Spark 1.3  version. And followed the steps below on Hbase
>>> table with around 3.5 lac rows :
>>>
>>>
>>>
>>> *1.       *Mapped the Dataframe to Hbase table .RDDCustomers maps to
>>> the hbase table which is used to create the Dataframe.
>>>
>>> *ā€œ DataFrame schemaCustomers = sqlInstance*
>>>
>>> *
>>> .createDataFrame(SparkContextImpl.getRddCustomers(),*
>>>
>>> *
>>> Customers.class);ā€ *
>>>
>>> 2.       Used registertemp table i.eā€
>>> *schemaCustomers.registerTempTable("customers");ā€*
>>>
>>> 3.       Running the query on Dataframe using Sqlcontext Instance.
>>>
>>>
>>>
>>> What I am observing is that for a single query on one filter criteria
>>> the query is taking 7-8 seconds? And the time increases as I am increasing
>>> the number of rows in Hbase table. Also, there was one time when I was
>>> getting query response under 1-2 seconds. Seems like strange behavior.
>>>
>>> Is this expected behavior from Spark or am I missing something here?
>>>
>>> Can somebody help me understand this scenario . Please assist.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Siddharth Ubale,
>>>
>>>
>>>
>>
>>

Reply via email to