You should consider presto for this use case. If you want fast "first query" times it is a better fit.
I think sparksql will catch up at some point but if you are not doing multiple queries against data cached in RDDs and need low latency it may not be a good fit. M > On Dec 1, 2015, at 7:23 PM, Andrés Ivaldi <iaiva...@gmail.com> wrote: > > Ok, so latency problem is being generated because I'm using SQL as source? > how about csv, hive, or another source? > > On Tue, Dec 1, 2015 at 9:18 PM, Mark Hamstra <m...@clearstorydata.com> wrote: >>> It is not designed for interactive queries. >> >> You might want to ask the designers of Spark, Spark SQL, and particularly >> some things built on top of Spark (such as BlinkDB) about their intent with >> regard to interactive queries. Interactive queries are not the only >> designed use of Spark, but it is going too far to claim that Spark is not >> designed at all to handle interactive queries. >> >> That being said, I think that you are correct to question the wisdom of >> expecting lowest-latency query response from Spark using SQL (sic, >> presumably a RDBMS is intended) as the datastore. >> >>> On Tue, Dec 1, 2015 at 4:05 PM, Jörn Franke <jornfra...@gmail.com> wrote: >>> Hmm it will never be faster than SQL if you use SQL as an underlying >>> storage. Spark is (currently) an in-memory batch engine for iterative >>> machine learning workloads. It is not designed for interactive queries. >>> Currently hive is going into the direction of interactive queries. >>> Alternatives are Hbase on Phoenix or Impala. >>> >>>> On 01 Dec 2015, at 21:58, Andrés Ivaldi <iaiva...@gmail.com> wrote: >>>> >>>> Yes, >>>> The use case would be, >>>> Have spark in a service (I didnt invertigate this yet), through api calls >>>> of this service we perform some aggregations over data in SQL, We are >>>> already doing this with an internal development >>>> >>>> Nothing complicated, for instance, a table with Product, Product Family, >>>> cost, price, etc. Columns like Dimension and Measures, >>>> >>>> I want to Spark for query that table to perform a kind of rollup, with >>>> cost as Measure and Prodcut, Product Family as Dimension >>>> >>>> Only 3 columns, it takes like 20s to perform that query and the >>>> aggregation, the query directly to the database with a grouping at the >>>> columns takes like 1s >>>> >>>> regards >>>> >>>> >>>> >>>>> On Tue, Dec 1, 2015 at 5:38 PM, Jörn Franke <jornfra...@gmail.com> wrote: >>>>> can you elaborate more on the use case? >>>>> >>>>> > On 01 Dec 2015, at 20:51, Andrés Ivaldi <iaiva...@gmail.com> wrote: >>>>> > >>>>> > Hi, >>>>> > >>>>> > I'd like to use spark to perform some transformations over data stored >>>>> > inSQL, but I need low Latency, I'm doing some test and I run into spark >>>>> > context creation and data query over SQL takes too long time. >>>>> > >>>>> > Any idea for speed up the process? >>>>> > >>>>> > regards. >>>>> > >>>>> > -- >>>>> > Ing. Ivaldi Andres >>>> >>>> >>>> >>>> -- >>>> Ing. Ivaldi Andres > > > > -- > Ing. Ivaldi Andres