Re: Low Performance of Shark over Spark.

2014-08-11 Thread vinay . kashyap
Hi Yana, I notice there is GC happening in every executor which is around 400ms on an average. Do you think it is a major impact on the overall query time..?? And regarding the memory for a single worker, I have tried distributing the memory by increasing the number of workers per node and divid

Re: Low Performance of Shark over Spark.

2014-08-08 Thread vinay.kashyap
Hi Mayur, I cannot use spark sql in this case because many of the aggregations are not supported yet. Hence I migrated back to use Shark as all those aggregation functions are supported. apache-spark-user-list.1001560.n3.nabble.com/Support-for-Percentile-and-Variance-Aggregation-functions-in-Spar

Re: Low Performance of Shark over Spark.

2014-08-08 Thread Mayur Rustagi
Hi Vinay, First of all you should probably migrate to sparksql as shark is not actively supported anymore. The 100x benefit entails in-memory caching & DAG, since you are not able to cache the performance can be quite low.. Alternatives you can explore 1. Use parquet as storage which will push down

Re: Low Performance of Shark over Spark.

2014-08-07 Thread vinay . kashyap
Hi Meng, I cannot use cached table in this case as the data size is quite huge. Also, as I am trying to run adhoc queries, I cannot keep the table cached. I can cache the table only when my requirement is such that, type of queries are fixed and for specific set of data.   Thanks and regards Vin

Re: Low Performance of Shark over Spark.

2014-08-07 Thread Xiangrui Meng
Did you cache the table? There are couple ways of caching a table in Shark: https://github.com/amplab/shark/wiki/Shark-User-Guide On Thu, Aug 7, 2014 at 6:51 AM, wrote: > Dear all, > > I am using Spark 0.9.2 in Standalone mode. Hive and HDFS in CDH 5.1.0. > > 6 worker nodes each with memory 96GB