In the upcoming 1.4.0 release, SPARK-3468 should give you better clue. Cheers
On Fri, May 1, 2015 at 12:30 PM, Siddharth Ubale < siddharth.ub...@syncoms.com> wrote: > Hi, > > > Thanks for the reply. > > > Hbase cli takes less than 500 ms for the same query. > > I am running a simple query i.t "Select * from Customers where > c_id='123123'". > > Why would the same query which takes 500 ms at Hbase cli end up taking > around 8 secs via Spark-Sql? > > I am unable t understand this. > > > Thanks, > > Siddharth > > > > > > ------------------------------ > *From:* ayan guha <guha.a...@gmail.com> > *Sent:* 01 May 2015 04:38 > *To:* Ted Yu > *Cc:* user@spark.apache.org; Siddharth Ubale; matei.zaha...@gmail.com; > Prakash Hosalli; Amit Kumar > *Subject:* Re: real time Query engine Spark-SQL on Hbase > > > And if I may ask, how long it takes in hbase CLI? I would not expect spark > to improve performance of hbase. At best spark will push down the filter > to hbase. So I would try to optimise any additional overhead like bringing > data into spark. > On 1 May 2015 00:56, "Ted Yu" <yuzhih...@gmail.com> wrote: > >> bq. a single query on one filter criteria >> >> Can you tell us more about your filter ? How selective is it ? >> >> Which hbase release are you using ? >> >> Cheers >> >> On Thu, Apr 30, 2015 at 7:23 AM, Siddharth Ubale < >> siddharth.ub...@syncoms.com> wrote: >> >>> Hi, >>> >>> >>> >>> I want to use Spark as Query engine on HBase with sub second latency. >>> >>> >>> >>> I am using Spark 1.3 version. And followed the steps below on Hbase >>> table with around 3.5 lac rows : >>> >>> >>> >>> *1. *Mapped the Dataframe to Hbase table .RDDCustomers maps to >>> the hbase table which is used to create the Dataframe. >>> >>> *ā DataFrame schemaCustomers = sqlInstance* >>> >>> * >>> .createDataFrame(SparkContextImpl.getRddCustomers(),* >>> >>> * >>> Customers.class);ā * >>> >>> 2. Used registertemp table i.eā >>> *schemaCustomers.registerTempTable("customers");ā* >>> >>> 3. Running the query on Dataframe using Sqlcontext Instance. >>> >>> >>> >>> What I am observing is that for a single query on one filter criteria >>> the query is taking 7-8 seconds? And the time increases as I am increasing >>> the number of rows in Hbase table. Also, there was one time when I was >>> getting query response under 1-2 seconds. Seems like strange behavior. >>> >>> Is this expected behavior from Spark or am I missing something here? >>> >>> Can somebody help me understand this scenario . Please assist. >>> >>> >>> >>> Thanks, >>> >>> Siddharth Ubale, >>> >>> >>> >> >>