Re: Low Performance of Shark over Spark.

vinay . kashyap Thu, 07 Aug 2014 22:16:33 -0700

Hi Meng,
I cannot use cached table in this case as the data size
is quite huge.
Also, as I am trying to run adhoc queries, I cannot
keep the table cached. I can cache the table only when my requirement is
such that, type of queries are fixed and for specific set of
data.
 
Thanks and regards
Vinay
Kashyap
________________________________________________

From:"Xiangrui Meng" <men...@gmail.com>

Sent:vinay.kash...@socialinfra.net

Cc:"user@spark.apache.org" 

Date:Thu, August 7, 2014 11:06 pm

Subject:Re: Low Performance of Shark over Spark.





> Did you cache the table? There are couple ways of caching a table
in

> Shark: https://github.com/amplab/shark/wiki/Shark-User-Guide

>

> On Thu, Aug 7, 2014 at 6:51 AM, <vinay.kash...@socialinfra.net>
wrote:

>> Dear all,

>>

>> I am using Spark 0.9.2 in Standalone mode. Hive and HDFS in CDH
5.1.0.

>>

>> 6 worker nodes each with memory 96GB and 32 cores.

>>

>> I am using Shark Shell to execute queries on Spark.

>>

>> I have a raw_table ( of size 3TB with replication 3 ) which is

>> partitioned

>> by year, month and day. I am running an adhoc query on one month
data

>> with

>> some condition.

>>

>> For eg:

>>

>> CREATE TABLE temp_table AS SELECT field1,field2,field3 FROM
raw_table

>> WHERE

>> year=2000 AND month=01 AND field10 > <some_value>;

>>

>> It is claimed that the same Hive queries can run 100x faster with
shark,

>> but

>> I don't see such a significant improvement when running the above
query,

>>

>> I am getting almost same performance as when run in Hive which is
around

>> 45

>> seconds.

>>

>> The same query with Impala, takes very less time, almost 7 times
less

>> time

>> than shark which is around 6 seconds. I have tried altering the
below

>> parameters for the spark jobs but did not see any difference.

>>

>> spark.local.dir

>> spark.serializer

>> spark.kryoserializer.buffer.mb

>> spark.storage.memoryFraction

>> spark.io.compression.codec

>> spark.default.parallelism

>>

>> Any suggestions so that I can improve the performance of the
query with

>> Shark over Spark and make it comparable to Impala..??

>>

>>

>>

>> Thanks and regards

>>

>> Vinay Kashyap

>

>
---------------------------------------------------------------------

> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

> For additional commands, e-mail: user-h...@spark.apache.org

>

>
Re: Low Performance of Shark over Spark.

Reply via email to