If *zeppelin.interpreter.connect.timeout *is reached, but the yarn app is still in ACCEPTED state, then this should be a bug. The yarn app should be killed it it can not be created in the timeout threashold
Sarthak Sharma <sarthak...@media.net> 于2018年11月20日周二 下午4:47写道: > Hey, > > Like you mentioned, I'm already using the *spark.yarn.queue* parameter, > hence I know which yarn queue it is getting scheduled in and this queue has > resources available for applications since other apps are also getting > scheduled there. > However, assuming the queue does NOT have resources for it to schedule > within the given time frame causing it to throw an exception after the > *zeppelin.interpreter.connect.timeout > *is reached, the application should in any case get scheduled eventually > which is not the case here. Interpreter driver process remains stuck in > ACCEPTED state. Is there a change in the way it is implemented in this > version ? Since we never experienced this on the previous one > (zeppelin-0.7.3) where drivers would get scheduled eventually in their > respective queues. > > On Tue, Nov 20, 2018, 7:29 AM Xun Liu <neliu...@163.com wrote: > >> HI,Sarthak Sharma >> >> The log shows that the task submitted by spark-submmit has been waiting >> for execution in the queue of YARN. Is there no resource for the queue of >> YARN? >> You can specify a queue with resources in the spark interpreter via the >> spark.yarn.queue parameter. >> >> >> 在 2018年11月19日,下午7:41,Sarthak Sharma <sarthak...@media.net> 写道: >> >> Hi, >> >> We already have a zeppelin-0.7.3 setup which runs fine and is in use >> currently but we are looking into the yarn cluster mode support for spark >> interpreter in zeppelin-0.8. I've built it from source from *branch-0.8 >> (As of Nov-15) *and am facing the following issues intermittently in >> some of the spark interpreters while trying to use spark-sql on it. >> >> *18/11/19 10:04:07 INFO yarn.Client: Submitting application >> application_1542587655772_35129 to ResourceManager* >> *18/11/19 10:04:07 INFO impl.YarnClientImpl: Submitted application >> application_1542587655772_35129* >> *18/11/19 10:04:08 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:08 INFO yarn.Client:* >> * client token: N/A* >> * diagnostics: N/A* >> * ApplicationMaster host: N/A* >> * ApplicationMaster RPC port: -1* >> * queue: root.zep* >> * start time: 1542621847537* >> * final status: UNDEFINED* >> * tracking >> URL: http://resource-manager-addr/proxy/application_1542587655772_35129/ >> <http://c8-auto-hadoop-service-1.srv.media.net:8088/proxy/application_1542587655772_35129/>* >> * user: sarthak.sh* >> *18/11/19 10:04:09 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:10 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:11 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:12 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:13 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:14 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:15 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:16 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:17 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:18 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:19 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:20 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:21 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:22 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:23 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:24 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:25 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:26 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:27 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:28 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:29 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:30 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:31 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:32 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:33 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:34 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:35 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:36 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:37 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:38 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:39 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:40 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:41 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:42 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:43 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:44 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:45 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:46 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:47 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:48 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:49 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:50 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:51 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:52 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:53 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:54 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:55 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:56 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:57 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:58 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:04:59 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:00 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:01 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:02 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:03 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:04 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:05 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:06 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:07 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:08 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:09 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:10 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:11 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:12 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:13 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:14 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:15 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:16 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:17 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:18 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:19 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:20 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:21 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:22 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:23 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:24 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:25 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:26 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:27 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:28 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:29 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:30 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:31 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:32 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:33 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:34 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:35 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:36 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:37 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:38 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:39 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:40 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:41 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:42 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:43 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:44 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:45 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:46 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:47 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:48 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:49 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:50 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:51 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:52 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:53 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:54 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:55 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:56 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:57 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:58 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> *18/11/19 10:05:59 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED)* >> >> * at >> org.apache.zeppelin.interpreter.remote.RemoteInterpreterManagedProcess.start(RemoteInterpreterManagedProcess.java:205)* >> * at >> org.apache.zeppelin.interpreter.ManagedInterpreterGroup.getOrCreateInterpreterProcess(ManagedInterpreterGroup.java:64)* >> * at >> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getOrCreateInterpreterProcess(RemoteInterpreter.java:111)* >> * at >> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:164)* >> * at >> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:132)* >> * at >> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:299)* >> * at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:407)* >> * at org.apache.zeppelin.scheduler.Job.run(Job.java:188)* >> * at >> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:315)* >> * at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)* >> * at java.util.concurrent.FutureTask.run(FutureTask.java:266)* >> * at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)* >> * at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)* >> * at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)* >> * at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)* >> * at java.lang.Thread.run(Thread.java:748)* >> >> Any further submit to this interpreter will give null pointer exceptions >> due to the absence of an interpreter process. >> It looks like the interpreter driver process while getting submitted to >> yarn, is stuck in ACCEPTED state because of which we're not able to connect >> to the remote interpreter process. This happens even if there are resources >> on the cluster in yarn. >> Also I've tried increasing the *zeppelin.interpreter.connect.timeout *but >> that didn't help since the application is stuck in ACCEPTED state >> indefinitely and there are no logs available too. >> It'll be great if you can point me to something that can help. Also >> please do let me know if any configuration files are required for debugging >> this. >> >> >> Thanks and Regards >> >> >> *Sarthak Sharma* >> DevOps Engineer, Media.Net <http://media.net/> >> +918002228376 | sarthak...@media.net >> <http://en-gb.facebook.com/people/Sarthak-Sharma/100006006014244> >> <http://in.linkedin.com/in/sarthaksharma96> >> >> >> -- Best Regards Jeff Zhang