RE: Spark Job Hangs on our production cluster

java8964 Tue, 11 Aug 2015 14:29:34 -0700

The executor's memory is reset by "--executor-memory 24G" for spark-shell.
The one from the spark-env.sh is just for default setting.
I can confirm from the Spark UI the executor heap is set as 24G.
Thanks
Yong


From: igor.ber...@gmail.com
Date: Tue, 11 Aug 2015 23:31:59 +0300
Subject: Re: Spark Job Hangs on our production cluster
To: java8...@hotmail.com
CC: user@spark.apache.org

how do u want to process 1T of data when you set your executor memory to be 
2g?look at spark ui, metrics of tasks...if anylook at spark logs on executor 
machine under work dir(unless you configured log4j)

I think your executors are thrashing or spilling to disk. check memory 
metrics/swapping
On 11 August 2015 at 23:19, java8964 <java8...@hotmail.com> wrote:



Currently we have a IBM BigInsight cluster with 1 namenode + 1 JobTracker + 42 
data/task nodes, which runs with BigInsight V3.0.0.2, corresponding with Hadoop 
2.2.0 with MR1.
Since IBM BigInsight doesn't come with Spark, so we build Spark 1.2.2 with 
Hadoop 2.2.0 + Hive 0.12 by ourselves, and deploy it on the same cluster.
The IBM Biginsight comes with IBM jdk 1.7, but during our experience on stage 
environment, we found out Spark works better with Oracle JVM. So we run spark 
under Oracle JDK 1.7.0_79.
Now on production, we are facing a issue we never faced, nor can reproduce on 
our staging cluster. 
We are using Spark Standalone cluster, and here is the basic configurations:
more spark-env.shexport JAVA_HOME=/opt/javaexport 
PATH=$JAVA_HOME/bin:$PATHexport 
HADOOP_CONF_DIR=/opt/ibm/biginsights/hadoop-conf/export 
SPARK_CLASSPATH=/opt/ibm/biginsights/IHC/lib/ibm-compression.jar:/opt/ibm/biginsights/hive/lib/db2jcc4-10.6.jarexport
 
SPARK_LOCAL_DIRS=/data1/spark/local,/data2/spark/local,/data3/spark/localexport 
SPARK_MASTER_WEBUI_PORT=8081export SPARK_MASTER_IP=host1export 
SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=42"export 
SPARK_WORKER_MEMORY=24gexport SPARK_WORKER_CORES=6export 
SPARK_WORKER_DIR=/tmp/spark/workexport SPARK_DRIVER_MEMORY=2gexport 
SPARK_EXECUTOR_MEMORY=2g
more spark-defaults.confspark.master                    
spark://host1:7077spark.eventLog.enabled                truespark.eventLog.dir  
        hdfs://host1:9000/spark/eventLogspark.serializer                
org.apache.spark.serializer.KryoSerializerspark.executor.extraJavaOptions       
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
We are using AVRO file format a lot, and we have these 2 datasets, one is about 
96G, and the other one is a little over 1T. Since we are using AVRO, so we also 
built spark-avro of commit "a788c9fce51b0ec1bb4ce88dc65c1d55aaa675b8", which is 
the latest version supporting Spark 1.2.x.
Here is the problem we are facing on our production cluster, even the following 
simple spark-shell commands will hang in our production cluster:
import org.apache.spark.sql.SQLContextval sqlContext = new 
org.apache.spark.sql.hive.HiveContext(sc)import com.databricks.spark.avro._val 
bigData = 
sqlContext.avroFile("hdfs://namenode:9000/bigData/")bigData.registerTempTable("bigData")bigData.count()
>From the console,  we saw following:[Stage 0:>                                 
>                              (44 + 42) / 7800]
no update for more than 30 minutes and longer.
The big dataset with 1T should generate 7800 HDFS block, but Spark's HDFS 
client looks like having problem to read them. Since we are running Spark on 
the data nodes, all the Spark tasks are running as "NODE_LOCAL" on locality 
level.
If I go to the data/task node which Spark tasks hang, and use the JStack to 
dump the thread, I got the following on the top:
015-08-11 15:38:38Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.79-b02 
mixed mode):
"Attach Listener" daemon prio=10 tid=0x00007f0660589000 nid=0x1584d waiting on 
condition [0x0000000000000000]   java.lang.Thread.State: RUNNABLE
"org.apache.hadoop.hdfs.PeerCache@4a88ec00" daemon prio=10 
tid=0x00007f06508b7800 nid=0x13302 waiting on condition [0x00007f060be94000]   
java.lang.Thread.State: TIMED_WAITING (sleeping)        at 
java.lang.Thread.sleep(Native Method)        at 
org.apache.hadoop.hdfs.PeerCache.run(PeerCache.java:252)        at 
org.apache.hadoop.hdfs.PeerCache.access$000(PeerCache.java:39)        at 
org.apache.hadoop.hdfs.PeerCache$1.run(PeerCache.java:135)        at 
java.lang.Thread.run(Thread.java:745)
"shuffle-client-1" daemon prio=10 tid=0x00007f0650687000 nid=0x132fc runnable 
[0x00007f060d198000]   java.lang.Thread.State: RUNNABLE        at 
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)        at 
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)        at 
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)        at 
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)        - locked 
<0x000000067bf47710> (a io.netty.channel.nio.SelectedSelectionKeySet)        - 
locked <0x000000067bf374e8> (a java.util.Collections$UnmodifiableSet)        - 
locked <0x000000067bf373d0> (a sun.nio.ch.EPollSelectorImpl)        at 
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)        at 
io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:622)        at 
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:310)        at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
        at java.lang.Thread.run(Thread.java:745)
Meantime, I can confirm our Hadoop/HDFS cluster works fine, as the MapReduce 
jobs also run without any problem, and "Hadoop fs" command works fine in the 
BigInsight.
I attached the jstack output with this email, but I don't know what could be 
the root reason.The same Spark shell command works fine, if I point to the 
small dataset, instead of big dataset. The small dataset will have around 800 
HDFS blocks, and Spark finishes without any problem.
Here are some facts I know:
1) Since the BigInsight is running on IBM JDK, so I make the Spark run under 
the same JDK, same problem for BigData set.2) I even changed 
"--total-executor-cores" to 42, which will make each executor runs with one 
core (as we have 42 Spark workers), to avoid any multithreads, but still no 
luck.3) This problem of scanning 1T data hanging is NOT 100% for sure 
happening. Sometime I didn't see it, but more than 50% I will see it if I 
try.4) We never met this issue on our stage cluster, but it has only (1 
namenode + 1 jobtracker + 3 data/task nodes), and the same dataset is only 160G 
on it.5) While the Spark java processing hanging, I didn't see any exception or 
issue on the HDFS data node log. 
Does anyone have any clue about this?
Thanks
Yong

                                          



---------------------------------------------------------------------

To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands, e-mail: user-h...@spark.apache.org

RE: Spark Job Hangs on our production cluster

Reply via email to