A tad off topic, but could still be relevant. Accumulo's design is a tad different in the realm of being able to shard and perform set intersections/unions server-side (through seeks). I've got an adapter for Spark SQL on top of a document store implementation in Accumulo that accepts the push-down predicates and actually performs query on the tablet servers. This strategy may be useful to you [1].
[1] https://github.com/calrissian/accumulo-recipes/blob/master/thirdparty/spark/src/test/scala/org/calrissian/accumulorecipes/spark/sql/EventStoreFilteredTest.scala On Thu, Apr 30, 2015 at 10:54 AM, Ted Yu <yuzhih...@gmail.com> wrote: > bq. a single query on one filter criteria > > Can you tell us more about your filter ? How selective is it ? > > Which hbase release are you using ? > > Cheers > > On Thu, Apr 30, 2015 at 7:23 AM, Siddharth Ubale < > siddharth.ub...@syncoms.com> wrote: > >> Hi, >> >> >> >> I want to use Spark as Query engine on HBase with sub second latency. >> >> >> >> I am using Spark 1.3 version. And followed the steps below on Hbase >> table with around 3.5 lac rows : >> >> >> >> *1. *Mapped the Dataframe to Hbase table .RDDCustomers maps to the >> hbase table which is used to create the Dataframe. >> >> *ā DataFrame schemaCustomers = sqlInstance* >> >> * >> .createDataFrame(SparkContextImpl.getRddCustomers(),* >> >> * >> Customers.class);ā * >> >> 2. Used registertemp table i.eā >> *schemaCustomers.registerTempTable("customers");ā* >> >> 3. Running the query on Dataframe using Sqlcontext Instance. >> >> >> >> What I am observing is that for a single query on one filter criteria the >> query is taking 7-8 seconds? And the time increases as I am increasing the >> number of rows in Hbase table. Also, there was one time when I was getting >> query response under 1-2 seconds. Seems like strange behavior. >> >> Is this expected behavior from Spark or am I missing something here? >> >> Can somebody help me understand this scenario . Please assist. >> >> >> >> Thanks, >> >> Siddharth Ubale, >> >> >> > >