You would have needed to configure it by setting yarn.scheduler.capacity.resource-calculator to something ending in DominantResourceCalculator. If you haven't configured it, there's a high probability that the recently committed https://issues.apache.org/jira/browse/SPARK-6050 will fix your problem.
On Wed, Feb 25, 2015 at 1:36 AM, Anders Arpteg <[email protected]> wrote: > We're using the capacity scheduler, to the best of my knowledge. Unsure if > multi resource scheduling is used, but if you know of an easy way to figure > that out, then let me know. > > Thanks, > Anders > > On Sat, Feb 21, 2015 at 12:05 AM, Sandy Ryza <[email protected]> > wrote: > >> Are you using the capacity scheduler or fifo scheduler without multi >> resource scheduling by any chance? >> >> On Thu, Feb 12, 2015 at 1:51 PM, Anders Arpteg <[email protected]> >> wrote: >> >>> The nm logs only seems to contain similar to the following. Nothing else >>> in the same time range. Any help? >>> >>> 2015-02-12 20:47:31,245 WARN >>> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: >>> Event EventType: KILL_CONTAINER sent to absent container >>> container_1422406067005_0053_01_000002 >>> 2015-02-12 20:47:31,246 WARN >>> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: >>> Event EventType: KILL_CONTAINER sent to absent container >>> container_1422406067005_0053_01_000012 >>> 2015-02-12 20:47:31,246 WARN >>> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: >>> Event EventType: KILL_CONTAINER sent to absent container >>> container_1422406067005_0053_01_000022 >>> 2015-02-12 20:47:31,246 WARN >>> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: >>> Event EventType: KILL_CONTAINER sent to absent container >>> container_1422406067005_0053_01_000032 >>> 2015-02-12 20:47:31,246 WARN >>> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: >>> Event EventType: KILL_CONTAINER sent to absent container >>> container_1422406067005_0053_01_000042 >>> 2015-02-12 21:24:30,515 WARN >>> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: >>> Event EventType: FINISH_APPLICATION sent to absent application >>> application_1422406067005_0053 >>> >>> On Thu, Feb 12, 2015 at 10:38 PM, Sandy Ryza <[email protected]> >>> wrote: >>> >>>> It seems unlikely to me that it would be a 2.2 issue, though not >>>> entirely impossible. Are you able to find any of the container logs? Is >>>> the NodeManager launching containers and reporting some exit code? >>>> >>>> -Sandy >>>> >>>> On Thu, Feb 12, 2015 at 1:21 PM, Anders Arpteg <[email protected]> >>>> wrote: >>>> >>>>> No, not submitting from windows, from a debian distribution. Had a >>>>> quick look at the rm logs, and it seems some containers are allocated but >>>>> then released again for some reason. Not easy to make sense of the logs, >>>>> but here is a snippet from the logs (from a test in our small test >>>>> cluster) >>>>> if you'd like to have a closer look: http://pastebin.com/8WU9ivqC >>>>> >>>>> Sandy, sounds like it could possible be a 2.2 issue then, or what do >>>>> you think? >>>>> >>>>> Thanks, >>>>> Anders >>>>> >>>>> On Thu, Feb 12, 2015 at 3:11 PM, Aniket Bhatnagar < >>>>> [email protected]> wrote: >>>>> >>>>>> This is tricky to debug. Check logs of node and resource manager of >>>>>> YARN to see if you can trace the error. In the past I have to closely >>>>>> look >>>>>> at arguments getting passed to YARN container (they get logged before >>>>>> attempting to launch containers). If I still don't get a clue, I had to >>>>>> check the script generated by YARN to execute the container and even run >>>>>> manually to trace at what line the error has occurred. >>>>>> >>>>>> BTW are you submitting the job from windows? >>>>>> >>>>>> On Thu, Feb 12, 2015, 3:34 PM Anders Arpteg <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Interesting to hear that it works for you. Are you using Yarn 2.2 as >>>>>>> well? No strange log message during startup, and can't see any other log >>>>>>> messages since no executer gets launched. Does not seems to work in >>>>>>> yarn-client mode either, failing with the exception below. >>>>>>> >>>>>>> Exception in thread "main" org.apache.spark.SparkException: Yarn >>>>>>> application has already ended! It might have been killed or unable to >>>>>>> launch application master. >>>>>>> at >>>>>>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:119) >>>>>>> at >>>>>>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:59) >>>>>>> at >>>>>>> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) >>>>>>> at >>>>>>> org.apache.spark.SparkContext.<init>(SparkContext.scala:370) >>>>>>> at >>>>>>> com.spotify.analytics.AnalyticsSparkContext.<init>(AnalyticsSparkContext.scala:8) >>>>>>> at >>>>>>> com.spotify.analytics.DataSampler$.main(DataSampler.scala:42) >>>>>>> at com.spotify.analytics.DataSampler.main(DataSampler.scala) >>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >>>>>>> Method) >>>>>>> at >>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>>>>> at >>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>>>> at >>>>>>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:551) >>>>>>> at >>>>>>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:155) >>>>>>> at >>>>>>> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:178) >>>>>>> at >>>>>>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:99) >>>>>>> at >>>>>>> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>>>>>> >>>>>>> /Anders >>>>>>> >>>>>>> >>>>>>> On Thu, Feb 12, 2015 at 1:33 AM, Sandy Ryza <[email protected] >>>>>>> > wrote: >>>>>>> >>>>>>>> Hi Anders, >>>>>>>> >>>>>>>> I just tried this out and was able to successfully acquire >>>>>>>> executors. Any strange log messages or additional color you can >>>>>>>> provide on >>>>>>>> your setup? Does yarn-client mode work? >>>>>>>> >>>>>>>> -Sandy >>>>>>>> >>>>>>>> On Wed, Feb 11, 2015 at 1:28 PM, Anders Arpteg <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Compiled the latest master of Spark yesterday (2015-02-10) for >>>>>>>>> Hadoop 2.2 and failed executing jobs in yarn-cluster mode for >>>>>>>>> that build. Works successfully with spark 1.2 (and also master from >>>>>>>>> 2015-01-16), so something has changed since then that prevents the >>>>>>>>> job from >>>>>>>>> receiving any executors on the cluster. >>>>>>>>> >>>>>>>>> Basic symptoms are that the jobs fires up the AM, but after >>>>>>>>> examining the "executors" page in the web ui, only the driver is >>>>>>>>> listed, no executors are ever received, and the driver keep waiting >>>>>>>>> forever. Has anyone seemed similar problems? >>>>>>>>> >>>>>>>>> Thanks for any insights, >>>>>>>>> Anders >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >>> >> >
