Here is a link for builds of 1.4 RC2: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-bin/
For a mvn repo, I believe the RC2 artifacts are here: https://repository.apache.org/content/repositories/orgapachespark-1104/ A few experiments you might try: 1. Does spark-shell work? It might start fine, but make sure you can create an RDD and use it, e.g., something like: val rdd = sc.parallelize(Seq(1,2,3,4,5,6)) rdd foreach println 2. Try coarse grained mode, which has different logic for executor management. You can set it in $SPARK_HOME/conf/spark-defaults.conf file: spark.mesos.coarse true Or, from this page <http://spark.apache.org/docs/latest/running-on-mesos.html>, set the property in a SparkConf object used to construct the SparkContext: conf.set("spark.mesos.coarse", "true") dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly) Typesafe <http://typesafe.com> @deanwampler <http://twitter.com/deanwampler> http://polyglotprogramming.com On Mon, May 25, 2015 at 12:06 PM, Reinis Vicups <sp...@orbit-x.de> wrote: > Hello, > > I assume I am running spark in a fine-grained mode since I haven't changed > the default here. > > One question regarding 1.4.0-RC1 - is there a mvn snapshot repository I > could use for my project config? (I know that I have to download source and > make-distribution for executor as well) > > thanks > reinis > > > On 25.05.2015 17:07, Iulian Dragoș wrote: > > > On Mon, May 25, 2015 at 2:43 PM, Reinis Vicups <sp...@orbit-x.de> wrote: > >> Hello, >> >> I am using Spark 1.3.1-hadoop2.4 with Mesos 0.22.1 with zookeeper and >> running on a cluster with 3 nodes on 64bit ubuntu. >> >> My application is compiled with spark 1.3.1 (apparently with mesos 0.21.0 >> dependency), hadoop 2.5.1-mapr-1503 and akka 2.3.10. Only with this >> combination I have succeeded to run spark-jobs on mesos at all. Different >> versions are causing class loader issues. >> >> I am submitting spark jobs with spark-submit with mesos://zk://.../mesos. >> > > Are you using coarse grained or fine grained mode? > > sandbox log of slave-node app01 (the one that stalls) shows following: >> >> 10:01:25.815506 35409 fetcher.cpp:214] Fetching URI >> 'hdfs://dev-hadoop01/apps/spark-1.3.1-bin-hadoop2.4.tgz' >> 10:01:26.497764 35409 fetcher.cpp:99] Fetching URI >> 'hdfs://dev-hadoop01/apps/spark-1.3.1-bin-hadoop2.4.tgz' using Hadoop Client >> 10:01:26.497869 35409 fetcher.cpp:109] Downloading resource from >> 'hdfs://dev-hadoop01/apps/spark-1.3.1-bin-hadoop2.4.tgz' to >> '/tmp/mesos/slaves/20150511-150924-3410235146-5050-1903-S3/frameworks/20150511-150924-3410235146-5050-1903-0249/executors/20150511-150924-3410235146-5050-1903-S3/runs/ec3a0f13-2f44-4952-bb23-4d48abaacc05/spark-1.3.1-bin-hadoop2.4.tgz' >> 10:01:32.877717 35409 fetcher.cpp:78] Extracted resource >> '/tmp/mesos/slaves/20150511-150924-3410235146-5050-1903-S3/frameworks/20150511-150924-3410235146-5050-1903-0249/executors/20150511-150924-3410235146-5050-1903-S3/runs/ec3a0f13-2f44-4952-bb23-4d48abaacc05/spark-1.3.1-bin-hadoop2.4.tgz' >> into >> '/tmp/mesos/slaves/20150511-150924-3410235146-5050-1903-S3/frameworks/20150511-150924-3410235146-5050-1903-0249/executors/20150511-150924-3410235146-5050-1903-S3/runs/ec3a0f13-2f44-4952-bb23-4d48abaacc05' >> Using Spark's default log4j profile: >> org/apache/spark/log4j-defaults.properties >> 10:01:34 INFO MesosExecutorBackend: Registered signal handlers for [TERM, >> HUP, INT] >> 10:01:34.459292 35730 exec.cpp:132] Version: 0.22.0 >> *10:01:34 ERROR MesosExecutorBackend: Received launchTask but executor >> was null* >> 10:01:34.540870 35765 exec.cpp:206] Executor registered on slave >> 20150511-150924-3410235146-5050-1903-S3 >> 10:01:34 INFO MesosExecutorBackend: Registered with Mesos as executor ID >> 20150511-150924-3410235146-5050-1903-S3 with 1 cpus >> > > It looks like an inconsistent state on the Mesos scheduler. It tries to > launch a task on a given slave before the executor has registered. This > code was improved/refactored in 1.4, could you try 1.4.0-RC1? > > Yes and note the second message after the error you highlighted; that's when the executor would be registered with Mesos and the local object created. > > iulian > > >> 10:01:34 INFO SecurityManager: Changing view acls to... >> 10:01:35 INFO Slf4jLogger: Slf4jLogger started >> 10:01:35 INFO Remoting: Starting remoting >> 10:01:35 INFO Remoting: Remoting started; listening on addresses >> :[akka.tcp://sparkExecutor@app01:xxx] >> 10:01:35 INFO Utils: Successfully started service 'sparkExecutor' on port >> xxx. >> 10:01:35 INFO AkkaUtils: Connecting to MapOutputTracker: >> akka.tcp://sparkDriver@dev-web01/user/MapOutputTracker >> 10:01:35 INFO AkkaUtils: Connecting to BlockManagerMaster: >> akka.tcp://sparkDriver@dev-web01/user/BlockManagerMaster >> 10:01:36 INFO DiskBlockManager: Created local directory at >> /tmp/spark-52a6585a-f9f2-4ab6-bebc-76be99b0c51c/blockmgr-e6d79818-fe30-4b5c-bcd6-8fbc5a201252 >> 10:01:36 INFO MemoryStore: MemoryStore started with capacity 88.3 MB >> 10:01:36 WARN NativeCodeLoader: Unable to load native-hadoop library for >> your platform... using builtin-java classes where applicable >> 10:01:36 INFO AkkaUtils: Connecting to OutputCommitCoordinator: >> akka.tcp://sparkDriver@dev-web01/user/OutputCommitCoordinator >> 10:01:36 INFO Executor: Starting executor ID >> 20150511-150924-3410235146-5050-1903-S3 on host app01 >> 10:01:36 INFO NettyBlockTransferService: Server created on XXX >> 10:01:36 INFO BlockManagerMaster: Trying to register BlockManager >> 10:01:36 INFO BlockManagerMaster: Registered BlockManager >> 10:01:36 INFO AkkaUtils: Connecting to HeartbeatReceiver: >> akka.tcp://sparkDriver@dev-web01/user/HeartbeatReceiver >> >> As soon as spark-driver is aborted, following log entries are added to >> the sandbox log of slave-node app01: >> >> 10:17:29.559433 35772 exec.cpp:379] Executor asked to shutdown >> 10:17:29 WARN ReliableDeliverySupervisor: Association with remote system >> [akka.tcp://sparkDriver@dev-web01] has failed, address is now gated for >> [5000] ms. Reason is: [Disassociated] >> >> Successful Job shows instead following in spark-driver log: >> >> 08:03:19,862 INFO o.a.s.s.TaskSetManager - Finished task 3.0 in stage >> 1.0 (TID 7) in 1688 ms on app01 (1/4) >> 08:03:19,869 INFO o.a.s.s.TaskSetManager - Finished task 0.0 in stage >> 1.0 (TID 4) in 1700 ms on app03 (2/4) >> 08:03:19,874 INFO o.a.s.s.TaskSetManager - Finished task 1.0 in stage >> 1.0 (TID 5) in 1703 ms on app02 (3/4) >> 08:03:19,878 INFO o.a.s.s.TaskSetManager - Finished task 2.0 in stage >> 1.0 (TID 6) in 1706 ms on app02 (4/4) >> 08:03:19,878 INFO o.a.s.s.DAGScheduler - Stage 1 >> (saveAsNewAPIHadoopDataset at ImportSparkJob.scala:90) finished in 1.718 s >> 08:03:19,878 INFO o.a.s.s.TaskSchedulerImpl - Removed TaskSet 1.0, whose >> tasks have all completed, from pool >> 08:03:19,886 INFO o.a.s.s.DAGScheduler - Job 0 finished: >> saveAsNewAPIHadoopDataset at ImportSparkJob.scala:90, took 16.946405 s >> >> this corresponds nicelly to sandbox logs of slave-nodes >> >> 08:03:19 INFO Executor: Finished task 3.0 in stage 1.0 (TID 7). 872 bytes >> result sent to driver >> 08:03:19 INFO Executor: Finished task 0.0 in stage 1.0 (TID 4). 872 bytes >> result sent to driver >> 08:03:19 INFO Executor: Finished task 1.0 in stage 1.0 (TID 5). 872 bytes >> result sent to driver >> 08:03:19 INFO Executor: Finished task 2.0 in stage 1.0 (TID 6). 872 bytes >> result sent to driver >> 08:03:20 WARN ReliableDeliverySupervisor: Association with remote system >> [akka.tcp://sparkDriver@dev-web01] has failed, address is now gated for >> [5000] ms. Reason is: [Disassociated]. >> > > > > -- > > -- > Iulian Dragos > > ------ > Reactive Apps on the JVM > www.typesafe.com > >