Additional Comment: I checked the disk usage on the 3 nodes (using iostat) and it seems that reading from HDFS partitions happen in a node-by-node basis. Only one of the nodes shows active IO (as read) at any given time while the other two nodes are idle IO-wise. I am not sure why the tasks are scheduled that way, as it is a map-only job and reading can happen in parallel.
On Thu, Aug 13, 2015 at 9:10 PM, James Pirz <james.p...@gmail.com> wrote: > Hi, > > I am using Spark 1.4 on a cluster (stand-alone mode), across 3 machines, > for a workload similar to TPCH (analytical queries with multiple/multi-way > large joins and aggregations). Each machine has 12GB of Memory and 4 cores. > My total data size is 150GB, stored in HDFS (stored as Hive tables), and I > am running my queries through Spark SQL using hive context. > After checking the performance tuning documents on the spark page and some > clips from latest spark summit, I decided to set the following configs in > my spark-env: > > SPARK_WORKER_INSTANCES=4 > SPARK_WORKER_CORES=1 > SPARK_WORKER_MEMORY=2500M > > (As my tasks tend to be long so the overhead of starting multiple JVMs, > one per worker is much less than the total query times). As I monitor the > job progress, I realized that while the Worker memory is 2.5GB, the > executors (one per worker) have max memory of 512MB (which is default). I > enlarged this value in my application as: > > conf.set("spark.executor.memory", "2.5g"); > > Trying to give max available memory on each worker to its only executor, > but I observed that my queries are running slower than the prev case > (default 512MB). Changing 2.5g to 1g improved the performance time, it is > close to but still worse than 512MB case. I guess what I am missing here is > what is the relationship between the "WORKER_MEMORY" and 'executor.memory'. > > - Isn't it the case that WORKER tries to split this memory among its > executors (in my case its only executor) ? Or there are other stuff being > done worker which need memory ? > > - What other important parameters I need to look into and tune at this > point to get the best response time out of my HW ? (I have read about Kryo > serializer, and I am about trying that - I am mainly concerned about memory > related settings and also knobs related to parallelism of my jobs). As an > example, for a simple scan-only query, Spark is worse than Hive (almost 3 > times slower) while both are scanning the exact same table & file format. > That is why I believe I am missing some params by leaving them as defaults. > > Any hint/suggestion would be highly appreciated. > > >