Hi Meng,
I cannot use cached table in this case as the data size
is quite huge.
Also, as I am trying to run adhoc queries, I cannot
keep the table cached. I can cache the table only when my requirement is
such that, type of queries are fixed and for specific set of
data.
Thanks and regards
Vinay
Kashyap
________________________________________________
From:"Xiangrui Meng" <men...@gmail.com>
Sent:vinay.kash...@socialinfra.net
Cc:"user@spark.apache.org"
Date:Thu, August 7, 2014 11:06 pm
Subject:Re: Low Performance of Shark over Spark.
> Did you cache the table? There are couple ways of caching a table
in
> Shark: https://github.com/amplab/shark/wiki/Shark-User-Guide
>
> On Thu, Aug 7, 2014 at 6:51 AM, <vinay.kash...@socialinfra.net>
wrote:
>> Dear all,
>>
>> I am using Spark 0.9.2 in Standalone mode. Hive and HDFS in CDH
5.1.0.
>>
>> 6 worker nodes each with memory 96GB and 32 cores.
>>
>> I am using Shark Shell to execute queries on Spark.
>>
>> I have a raw_table ( of size 3TB with replication 3 ) which is
>> partitioned
>> by year, month and day. I am running an adhoc query on one month
data
>> with
>> some condition.
>>
>> For eg:
>>
>> CREATE TABLE temp_table AS SELECT field1,field2,field3 FROM
raw_table
>> WHERE
>> year=2000 AND month=01 AND field10 > <some_value>;
>>
>> It is claimed that the same Hive queries can run 100x faster with
shark,
>> but
>> I don't see such a significant improvement when running the above
query,
>>
>> I am getting almost same performance as when run in Hive which is
around
>> 45
>> seconds.
>>
>> The same query with Impala, takes very less time, almost 7 times
less
>> time
>> than shark which is around 6 seconds. I have tried altering the
below
>> parameters for the spark jobs but did not see any difference.
>>
>> spark.local.dir
>> spark.serializer
>> spark.kryoserializer.buffer.mb
>> spark.storage.memoryFraction
>> spark.io.compression.codec
>> spark.default.parallelism
>>
>> Any suggestions so that I can improve the performance of the
query with
>> Shark over Spark and make it comparable to Impala..??
>>
>>
>>
>> Thanks and regards
>>
>> Vinay Kashyap
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>