I forgot to mention that this is a long running job, actually a spark
streaming job, and it's using mesos coarse mode. I'm still using the
unreliable kafka receiver.

2015-07-13 17:15 GMT+01:00 Luis Ángel Vicente Sánchez <
langel.gro...@gmail.com>:

> I have just upgrade one of my spark jobs from spark 1.2.1 to spark 1.4.0
> and after deploying it to mesos, it's not working anymore.
>
> The upgrade process was quite easy:
>
> - Create a new docker container for spark 1.4.0.
> - Upgrade spark job to use spark 1.4.0 as a dependency and create a new
> fatjar.
> - Create a docker container for the jobs,  based on previous spark 1.4.0
> container.
>
> After deploying it to marathon, the job only displays the driver under
> executors and no task progresses. I haven't made any change to my config
> files (apart for updating spark.executors.uri to point to the right file on
> s3).
>
> If I go to mesos and I check my job under frameworks, I can see a few
> failed stages; the content of stderr looks always like this:
>
> I0713 15:59:45.774368  1327 fetcher.cpp:214] Fetching URI 
> 'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz'
> I0713 15:59:45.774483  1327 fetcher.cpp:125] Fetching URI 
> 'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz'
>  with os::net
> I0713 15:59:45.774494  1327 fetcher.cpp:135] Downloading 
> 'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz'
>  to 
> '/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58/spark-1.4.0-bin-hadoop2.4.tgz'
> I0713 15:59:50.700959  1327 fetcher.cpp:78] Extracted resource 
> '/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58/spark-1.4.0-bin-hadoop2.4.tgz'
>  into 
> '/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58'
> I0713 15:59:50.973274  1333 exec.cpp:132] Version: 0.22.1
> I0713 15:59:50.998219  1339 exec.cpp:206] Executor registered on slave 
> 20150713-133618-421011372-5050-8867-S5
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> 15/07/13 15:59:51 INFO CoarseGrainedExecutorBackend: Registered signal 
> handlers for [TERM, HUP, INT]
> 15/07/13 15:59:52 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 15/07/13 15:59:52 INFO SecurityManager: Changing view acls to: root
> 15/07/13 15:59:52 INFO SecurityManager: Changing modify acls to: root
> 15/07/13 15:59:52 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(root); users 
> with modify permissions: Set(root)
> 15/07/13 15:59:52 INFO Slf4jLogger: Slf4jLogger started
> 15/07/13 15:59:52 INFO Remoting: Starting remoting
> 15/07/13 15:59:53 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://driverpropsfetc...@int-mesos-slave-ib4583253.mclabs.io:41854]
> 15/07/13 15:59:53 INFO Utils: Successfully started service 
> 'driverPropsFetcher' on port 41854.
> 15/07/13 15:59:53 INFO SecurityManager: Changing view acls to: root
> 15/07/13 15:59:53 INFO SecurityManager: Changing modify acls to: root
> 15/07/13 15:59:53 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(root); users 
> with modify permissions: Set(root)
> 15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator: Shutting 
> down remote daemon.
> 15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator: Remote 
> daemon shut down; proceeding with flushing remote transports.
> 15/07/13 15:59:53 INFO Slf4jLogger: Slf4jLogger started
> 15/07/13 15:59:53 INFO Remoting: Starting remoting
> 15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator: Remoting 
> shut down.
> 15/07/13 15:59:53 INFO Utils: Successfully started service 'sparkExecutor' on 
> port 60219.
> 15/07/13 15:59:53 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://sparkexecu...@int-mesos-slave-ib4583253.mclabs.io:60219]
> 15/07/13 15:59:53 INFO DiskBlockManager: Created local directory at 
> /var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439/blockmgr-4047306e-9dc8-48e4-bc25-300f4cf0be87
> 15/07/13 15:59:53 INFO MemoryStore: MemoryStore started with capacity 267.5 MB
> Exception in thread "main" java.io.FileNotFoundException: 
> /etc/mindcandy/metrics.properties (No such file or directory)
>       at java.io.FileInputStream.open0(Native Method)
>       at java.io.FileInputStream.open(FileInputStream.java:195)
>       at java.io.FileInputStream.<init>(FileInputStream.java:138)
>       at java.io.FileInputStream.<init>(FileInputStream.java:93)
>       at 
> org.apache.spark.metrics.MetricsConfig$$anonfun$1.apply(MetricsConfig.scala:50)
>       at 
> org.apache.spark.metrics.MetricsConfig$$anonfun$1.apply(MetricsConfig.scala:50)
>       at scala.Option.map(Option.scala:145)
>       at 
> org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:50)
>       at org.apache.spark.metrics.MetricsSystem.<init>(MetricsSystem.scala:93)
>       at 
> org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:222)
>       at org.apache.spark.SparkEnv$.create(SparkEnv.scala:367)
>       at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:211)
>       at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:180)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:65)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65)
>       at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:146)
>       at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:245)
>       at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
> 15/07/13 15:59:53 INFO DiskBlockManager: Shutdown hook called
> 15/07/13 15:59:53 INFO Utils: path = 
> /var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439/blockmgr-4047306e-9dc8-48e4-bc25-300f4cf0be87,
>  already present as root for deletion.
> 15/07/13 15:59:53 INFO Utils: Shutdown hook called
> 15/07/13 15:59:53 INFO Utils: Deleting directory 
> /var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439
>
>
>
>

Reply via email to