You should consider presto for this use case. If you want fast "first query" 
times it is a better fit.

I think sparksql will catch up at some point but if you are not doing multiple 
queries against data cached in RDDs and need low latency it may not be a good 
fit.

M

> On Dec 1, 2015, at 7:23 PM, Andrés Ivaldi <iaiva...@gmail.com> wrote:
> 
> Ok, so latency problem is being generated because I'm using SQL as source? 
> how about csv, hive, or another source?
> 
> On Tue, Dec 1, 2015 at 9:18 PM, Mark Hamstra <m...@clearstorydata.com> wrote:
>>> It is not designed for interactive queries.
>> 
>> You might want to ask the designers of Spark, Spark SQL, and particularly 
>> some things built on top of Spark (such as BlinkDB) about their intent with 
>> regard to interactive queries.  Interactive queries are not the only 
>> designed use of Spark, but it is going too far to claim that Spark is not 
>> designed at all to handle interactive queries.
>> 
>> That being said, I think that you are correct to question the wisdom of 
>> expecting lowest-latency query response from Spark using SQL (sic, 
>> presumably a RDBMS is intended) as the datastore.
>> 
>>> On Tue, Dec 1, 2015 at 4:05 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>>> Hmm it will never be faster than SQL if you use SQL as an underlying 
>>> storage. Spark is (currently) an in-memory batch engine for iterative 
>>> machine learning workloads. It is not designed for interactive queries. 
>>> Currently hive is going into the direction of interactive queries. 
>>> Alternatives are Hbase on Phoenix or Impala.
>>> 
>>>> On 01 Dec 2015, at 21:58, Andrés Ivaldi <iaiva...@gmail.com> wrote:
>>>> 
>>>> Yes, 
>>>> The use case would be,
>>>> Have spark in a service (I didnt invertigate this yet), through api calls 
>>>> of this service we perform some aggregations over data in SQL, We are 
>>>> already doing this with an internal development
>>>> 
>>>> Nothing complicated, for instance, a table with Product, Product Family, 
>>>> cost, price, etc. Columns like Dimension and Measures,
>>>> 
>>>> I want to Spark for query that table to perform a kind of rollup, with 
>>>> cost as Measure and Prodcut, Product Family as Dimension
>>>> 
>>>> Only 3 columns, it takes like 20s to perform that query and the 
>>>> aggregation, the  query directly to the database with a grouping at the 
>>>> columns takes like 1s 
>>>> 
>>>> regards
>>>> 
>>>> 
>>>> 
>>>>> On Tue, Dec 1, 2015 at 5:38 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>>>>> can you elaborate more on the use case?
>>>>> 
>>>>> > On 01 Dec 2015, at 20:51, Andrés Ivaldi <iaiva...@gmail.com> wrote:
>>>>> >
>>>>> > Hi,
>>>>> >
>>>>> > I'd like to use spark to perform some transformations over data stored 
>>>>> > inSQL, but I need low Latency, I'm doing some test and I run into spark 
>>>>> > context creation and data query over SQL takes too long time.
>>>>> >
>>>>> > Any idea for speed up the process?
>>>>> >
>>>>> > regards.
>>>>> >
>>>>> > --
>>>>> > Ing. Ivaldi Andres
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Ing. Ivaldi Andres
> 
> 
> 
> -- 
> Ing. Ivaldi Andres

Reply via email to