http://cacm.acm.org/magazines/2011/6/108651-10-rules-for-scalable-performance-in-simple-operation-datastores/fulltext
Try to read this article. It might help you understand your problem. Thanks, Xiao Li 2015-12-01 16:36 GMT-08:00 Mark Hamstra <[email protected]>: > I'd ask another question first: If your SQL query can be executed in a > performant fashion against a conventional (RDBMS?) database, why are you > trying to use Spark? How you answer that question will be the key to > deciding among the engineering design tradeoffs to effectively use Spark or > some other solution. > > On Tue, Dec 1, 2015 at 4:23 PM, Andrés Ivaldi <[email protected]> wrote: > >> Ok, so latency problem is being generated because I'm using SQL as >> source? how about csv, hive, or another source? >> >> On Tue, Dec 1, 2015 at 9:18 PM, Mark Hamstra <[email protected]> >> wrote: >> >>> It is not designed for interactive queries. >>> >>> >>> You might want to ask the designers of Spark, Spark SQL, and >>> particularly some things built on top of Spark (such as BlinkDB) about >>> their intent with regard to interactive queries. Interactive queries are >>> not the only designed use of Spark, but it is going too far to claim that >>> Spark is not designed at all to handle interactive queries. >>> >>> That being said, I think that you are correct to question the wisdom of >>> expecting lowest-latency query response from Spark using SQL (sic, >>> presumably a RDBMS is intended) as the datastore. >>> >>> On Tue, Dec 1, 2015 at 4:05 PM, Jörn Franke <[email protected]> >>> wrote: >>> >>>> Hmm it will never be faster than SQL if you use SQL as an underlying >>>> storage. Spark is (currently) an in-memory batch engine for iterative >>>> machine learning workloads. It is not designed for interactive queries. >>>> Currently hive is going into the direction of interactive queries. >>>> Alternatives are Hbase on Phoenix or Impala. >>>> >>>> On 01 Dec 2015, at 21:58, Andrés Ivaldi <[email protected]> wrote: >>>> >>>> Yes, >>>> The use case would be, >>>> Have spark in a service (I didnt invertigate this yet), through api >>>> calls of this service we perform some aggregations over data in SQL, We are >>>> already doing this with an internal development >>>> >>>> Nothing complicated, for instance, a table with Product, Product >>>> Family, cost, price, etc. Columns like Dimension and Measures, >>>> >>>> I want to Spark for query that table to perform a kind of rollup, with >>>> cost as Measure and Prodcut, Product Family as Dimension >>>> >>>> Only 3 columns, it takes like 20s to perform that query and the >>>> aggregation, the query directly to the database with a grouping at the >>>> columns takes like 1s >>>> >>>> regards >>>> >>>> >>>> >>>> On Tue, Dec 1, 2015 at 5:38 PM, Jörn Franke <[email protected]> >>>> wrote: >>>> >>>>> can you elaborate more on the use case? >>>>> >>>>> > On 01 Dec 2015, at 20:51, Andrés Ivaldi <[email protected]> wrote: >>>>> > >>>>> > Hi, >>>>> > >>>>> > I'd like to use spark to perform some transformations over data >>>>> stored inSQL, but I need low Latency, I'm doing some test and I run into >>>>> spark context creation and data query over SQL takes too long time. >>>>> > >>>>> > Any idea for speed up the process? >>>>> > >>>>> > regards. >>>>> > >>>>> > -- >>>>> > Ing. Ivaldi Andres >>>>> >>>> >>>> >>>> >>>> -- >>>> Ing. Ivaldi Andres >>>> >>>> >>> >> >> >> -- >> Ing. Ivaldi Andres >> > >
