I'm trying to run spark (1.4.1) on top of mesos (0.23). I've followed
the instructions (uploaded spark tarball to HDFS, set executor uri in
both places etc) and yet on the slaves it's failing to lauch even the
SparkPi example with a JNI error. It does run with a local master. A
day of debugging later and it's time to ask for help!
bin/spark-submit --master mesos://10.1.201.191:5050 --class
org.apache.spark.examples.SparkPi /tmp/examples.jar
(I'm putting the jar outside hdfs - on both client box + slave (turned
off other slaves for debugging) - due to
http://apache-spark-user-list.1001560.n3.nabble.com/Remote-jar-file-td20649.html.
I should note that I had the same JNI errors when using the mesos
cluster dispatcher).
I'm using Oracle Java 8 (no other java - even openjdk - is installed)
As you can see, the slave is downloading the framework fine (you can
even see it extracted on the slave). Can anyone shed some light on
what's going on - e.g. how is it attempting to run the executor?
I'm going to try a different JVM (and try a custom spark distribution)
but I suspect that the problem is much more basic. Maybe it can't find
the hadoop native libs?
Any light would be much appreciated :) I've included the slaves's
stderr below:
I0909 14:14:01.405185 32132 logging.cpp:177] Logging to STDERR
I0909 14:14:01.405256 32132 fetcher.cpp:409] Fetcher Info:
{"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20150826-133446-3217621258-5050-4064-S0\/ubuntu","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"hdfs:\/\/\/apps\/spark\/spark.tgz"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/20150826-133446-3217621258-5050-4064-S0\/frameworks\/20150826-133446-3217621258-5050-4064-211198\/executors\/20150826-133446-3217621258-5050-4064-S0\/runs\/38077da2-553e-4888-bfa3-ece2ab2119f3","user":"ubuntu"}
I0909 14:14:01.406332 32132 fetcher.cpp:364] Fetching URI
'hdfs:///apps/spark/spark.tgz'
I0909 14:14:01.406344 32132 fetcher.cpp:238] Fetching directly into the
sandbox directory
I0909 14:14:01.406358 32132 fetcher.cpp:176] Fetching URI
'hdfs:///apps/spark/spark.tgz'
I0909 14:14:01.679055 32132 fetcher.cpp:104] Downloading resource with
Hadoop client from 'hdfs:///apps/spark/spark.tgz' to
'/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz'
I0909 14:14:05.492626 32132 fetcher.cpp:76] Extracting with command: tar
-C
'/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3'
-xf
'/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz'
I0909 14:14:07.489753 32132 fetcher.cpp:84] Extracted
'/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz'
into
'/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3'
W0909 14:14:07.489784 32132 fetcher.cpp:260] Copying instead of
extracting resource from URI with 'extract' flag, because it does not
seem to be an archive: hdfs:///apps/spark/spark.tgz
I0909 14:14:07.489791 32132 fetcher.cpp:441] Fetched
'hdfs:///apps/spark/spark.tgz' to
'/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz'
Error: A JNI error has occurred, please check your installation and try
again
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at
sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at
sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more