Re: Spark is much slower than direct access MySQL

Louis Hust Sun, 26 Jul 2015 02:29:32 -0700

I got it, thanks for that

2015-07-26 17:21 GMT+08:00 Paolo Platter <[email protected]>:


>  If you want a performance boost, you need to load the full table in
> memory using caching and them execute your query directly on cached
> dataframe. Otherwise you use spark only as a bridge and you don't leverage
> the distributed in memory engine of spark.
>
> Paolo
>
> Inviata dal mio Windows Phone
>  ------------------------------
> Da: Louis Hust <[email protected]>
> Inviato: ‎26/‎07/‎2015 10:28
> A: Shixiong Zhu <[email protected]>
> Cc: Jerrick Hoang <[email protected]>; [email protected]
> Oggetto: Re: Spark is much slower than direct access MySQL
>
>  Thanks for your explain
>
> 2015-07-26 16:22 GMT+08:00 Shixiong Zhu <[email protected]>:
>
>> Oh, I see. That's the total time of executing a query in Spark. Then the
>> difference is reasonable, considering Spark has much more work to do, e.g.,
>> launching tasks in executors.
>>
>>      Best Regards,
>> Shixiong Zhu
>>
>> 2015-07-26 16:16 GMT+08:00 Louis Hust <[email protected]>:
>>
>>> Look at the given url:
>>>
>>>  Code can be found at:
>>>
>>>
>>> https://github.com/louishust/sparkDemo/blob/master/src/main/java/DirectQueryTest.java
>>>
>>> 2015-07-26 16:14 GMT+08:00 Shixiong Zhu <[email protected]>:
>>>
>>>> Could you clarify how you measure the Spark time cost? Is it the total
>>>> time of running the query? If so, it's possible because the overhead of
>>>> Spark dominates for small queries.
>>>>
>>>>      Best Regards,
>>>> Shixiong Zhu
>>>>
>>>> 2015-07-26 15:56 GMT+08:00 Jerrick Hoang <[email protected]>:
>>>>
>>>>> how big is the dataset? how complicated is the query?
>>>>>
>>>>>  On Sun, Jul 26, 2015 at 12:47 AM Louis Hust <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi, all,
>>>>>>
>>>>>>  I am using spark DataFrame to fetch small table from MySQL,
>>>>>> and i found it cost so much than directly access MySQL Using JDBC.
>>>>>>
>>>>>>  Time cost for Spark is about 2033ms, and direct access at
>>>>>> about 16ms.
>>>>>>
>>>>>>  Code can be found at:
>>>>>>
>>>>>>
>>>>>> https://github.com/louishust/sparkDemo/blob/master/src/main/java/DirectQueryTest.java
>>>>>>
>>>>>>  So If my configuration for spark is wrong? How to optimise Spark to
>>>>>> achieve the similar performance like direct access?
>>>>>>
>>>>>>  Any idea will be appreciated!
>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: Spark is much slower than direct access MySQL

Reply via email to