I have just upgrade one of my spark jobs from spark 1.2.1 to spark 1.4.0
and after deploying it to mesos, it's not working anymore.

The upgrade process was quite easy:

- Create a new docker container for spark 1.4.0.
- Upgrade spark job to use spark 1.4.0 as a dependency and create a new
fatjar.
- Create a docker container for the jobs,  based on previous spark 1.4.0
container.

After deploying it to marathon, the job only displays the driver under
executors and no task progresses. I haven't made any change to my config
files (apart for updating spark.executors.uri to point to the right file on
s3).

If I go to mesos and I check my job under frameworks, I can see a few
failed stages; the content of stderr looks always like this:

I0713 15:59:45.774368  1327 fetcher.cpp:214] Fetching URI
'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz'
I0713 15:59:45.774483  1327 fetcher.cpp:125] Fetching URI
'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz'
with os::net
I0713 15:59:45.774494  1327 fetcher.cpp:135] Downloading
'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz'
to 
'/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58/spark-1.4.0-bin-hadoop2.4.tgz'
I0713 15:59:50.700959  1327 fetcher.cpp:78] Extracted resource
'/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58/spark-1.4.0-bin-hadoop2.4.tgz'
into 
'/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58'
I0713 15:59:50.973274  1333 exec.cpp:132] Version: 0.22.1
I0713 15:59:50.998219  1339 exec.cpp:206] Executor registered on slave
20150713-133618-421011372-5050-8867-S5
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/07/13 15:59:51 INFO CoarseGrainedExecutorBackend: Registered signal
handlers for [TERM, HUP, INT]
15/07/13 15:59:52 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where
applicable
15/07/13 15:59:52 INFO SecurityManager: Changing view acls to: root
15/07/13 15:59:52 INFO SecurityManager: Changing modify acls to: root
15/07/13 15:59:52 INFO SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view
permissions: Set(root); users with modify permissions: Set(root)
15/07/13 15:59:52 INFO Slf4jLogger: Slf4jLogger started
15/07/13 15:59:52 INFO Remoting: Starting remoting
15/07/13 15:59:53 INFO Remoting: Remoting started; listening on
addresses 
:[akka.tcp://driverpropsfetc...@int-mesos-slave-ib4583253.mclabs.io:41854]
15/07/13 15:59:53 INFO Utils: Successfully started service
'driverPropsFetcher' on port 41854.
15/07/13 15:59:53 INFO SecurityManager: Changing view acls to: root
15/07/13 15:59:53 INFO SecurityManager: Changing modify acls to: root
15/07/13 15:59:53 INFO SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view
permissions: Set(root); users with modify permissions: Set(root)
15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator:
Shutting down remote daemon.
15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator:
Remote daemon shut down; proceeding with flushing remote transports.
15/07/13 15:59:53 INFO Slf4jLogger: Slf4jLogger started
15/07/13 15:59:53 INFO Remoting: Starting remoting
15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator:
Remoting shut down.
15/07/13 15:59:53 INFO Utils: Successfully started service
'sparkExecutor' on port 60219.
15/07/13 15:59:53 INFO Remoting: Remoting started; listening on
addresses :[akka.tcp://sparkexecu...@int-mesos-slave-ib4583253.mclabs.io:60219]
15/07/13 15:59:53 INFO DiskBlockManager: Created local directory at
/var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439/blockmgr-4047306e-9dc8-48e4-bc25-300f4cf0be87
15/07/13 15:59:53 INFO MemoryStore: MemoryStore started with capacity 267.5 MB
Exception in thread "main" java.io.FileNotFoundException:
/etc/mindcandy/metrics.properties (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at java.io.FileInputStream.<init>(FileInputStream.java:93)
        at 
org.apache.spark.metrics.MetricsConfig$$anonfun$1.apply(MetricsConfig.scala:50)
        at 
org.apache.spark.metrics.MetricsConfig$$anonfun$1.apply(MetricsConfig.scala:50)
        at scala.Option.map(Option.scala:145)
        at 
org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:50)
        at org.apache.spark.metrics.MetricsSystem.<init>(MetricsSystem.scala:93)
        at 
org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:222)
        at org.apache.spark.SparkEnv$.create(SparkEnv.scala:367)
        at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:211)
        at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:180)
        at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66)
        at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:65)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at 
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65)
        at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:146)
        at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:245)
        at 
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
15/07/13 15:59:53 INFO DiskBlockManager: Shutdown hook called
15/07/13 15:59:53 INFO Utils: path =
/var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439/blockmgr-4047306e-9dc8-48e4-bc25-300f4cf0be87,
already present as root for deletion.
15/07/13 15:59:53 INFO Utils: Shutdown hook called
15/07/13 15:59:53 INFO Utils: Deleting directory
/var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439

Reply via email to