I forgot to mention that this is a long running job, actually a spark streaming job, and it's using mesos coarse mode. I'm still using the unreliable kafka receiver.
2015-07-13 17:15 GMT+01:00 Luis Ángel Vicente Sánchez < langel.gro...@gmail.com>: > I have just upgrade one of my spark jobs from spark 1.2.1 to spark 1.4.0 > and after deploying it to mesos, it's not working anymore. > > The upgrade process was quite easy: > > - Create a new docker container for spark 1.4.0. > - Upgrade spark job to use spark 1.4.0 as a dependency and create a new > fatjar. > - Create a docker container for the jobs, based on previous spark 1.4.0 > container. > > After deploying it to marathon, the job only displays the driver under > executors and no task progresses. I haven't made any change to my config > files (apart for updating spark.executors.uri to point to the right file on > s3). > > If I go to mesos and I check my job under frameworks, I can see a few > failed stages; the content of stderr looks always like this: > > I0713 15:59:45.774368 1327 fetcher.cpp:214] Fetching URI > 'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz' > I0713 15:59:45.774483 1327 fetcher.cpp:125] Fetching URI > 'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz' > with os::net > I0713 15:59:45.774494 1327 fetcher.cpp:135] Downloading > 'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz' > to > '/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58/spark-1.4.0-bin-hadoop2.4.tgz' > I0713 15:59:50.700959 1327 fetcher.cpp:78] Extracted resource > '/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58/spark-1.4.0-bin-hadoop2.4.tgz' > into > '/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58' > I0713 15:59:50.973274 1333 exec.cpp:132] Version: 0.22.1 > I0713 15:59:50.998219 1339 exec.cpp:206] Executor registered on slave > 20150713-133618-421011372-5050-8867-S5 > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > 15/07/13 15:59:51 INFO CoarseGrainedExecutorBackend: Registered signal > handlers for [TERM, HUP, INT] > 15/07/13 15:59:52 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 15/07/13 15:59:52 INFO SecurityManager: Changing view acls to: root > 15/07/13 15:59:52 INFO SecurityManager: Changing modify acls to: root > 15/07/13 15:59:52 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(root); users > with modify permissions: Set(root) > 15/07/13 15:59:52 INFO Slf4jLogger: Slf4jLogger started > 15/07/13 15:59:52 INFO Remoting: Starting remoting > 15/07/13 15:59:53 INFO Remoting: Remoting started; listening on addresses > :[akka.tcp://driverpropsfetc...@int-mesos-slave-ib4583253.mclabs.io:41854] > 15/07/13 15:59:53 INFO Utils: Successfully started service > 'driverPropsFetcher' on port 41854. > 15/07/13 15:59:53 INFO SecurityManager: Changing view acls to: root > 15/07/13 15:59:53 INFO SecurityManager: Changing modify acls to: root > 15/07/13 15:59:53 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(root); users > with modify permissions: Set(root) > 15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator: Shutting > down remote daemon. > 15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator: Remote > daemon shut down; proceeding with flushing remote transports. > 15/07/13 15:59:53 INFO Slf4jLogger: Slf4jLogger started > 15/07/13 15:59:53 INFO Remoting: Starting remoting > 15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator: Remoting > shut down. > 15/07/13 15:59:53 INFO Utils: Successfully started service 'sparkExecutor' on > port 60219. > 15/07/13 15:59:53 INFO Remoting: Remoting started; listening on addresses > :[akka.tcp://sparkexecu...@int-mesos-slave-ib4583253.mclabs.io:60219] > 15/07/13 15:59:53 INFO DiskBlockManager: Created local directory at > /var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439/blockmgr-4047306e-9dc8-48e4-bc25-300f4cf0be87 > 15/07/13 15:59:53 INFO MemoryStore: MemoryStore started with capacity 267.5 MB > Exception in thread "main" java.io.FileNotFoundException: > /etc/mindcandy/metrics.properties (No such file or directory) > at java.io.FileInputStream.open0(Native Method) > at java.io.FileInputStream.open(FileInputStream.java:195) > at java.io.FileInputStream.<init>(FileInputStream.java:138) > at java.io.FileInputStream.<init>(FileInputStream.java:93) > at > org.apache.spark.metrics.MetricsConfig$$anonfun$1.apply(MetricsConfig.scala:50) > at > org.apache.spark.metrics.MetricsConfig$$anonfun$1.apply(MetricsConfig.scala:50) > at scala.Option.map(Option.scala:145) > at > org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:50) > at org.apache.spark.metrics.MetricsSystem.<init>(MetricsSystem.scala:93) > at > org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:222) > at org.apache.spark.SparkEnv$.create(SparkEnv.scala:367) > at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:211) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:180) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:65) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:146) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:245) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) > 15/07/13 15:59:53 INFO DiskBlockManager: Shutdown hook called > 15/07/13 15:59:53 INFO Utils: path = > /var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439/blockmgr-4047306e-9dc8-48e4-bc25-300f4cf0be87, > already present as root for deletion. > 15/07/13 15:59:53 INFO Utils: Shutdown hook called > 15/07/13 15:59:53 INFO Utils: Deleting directory > /var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439 > > > >