Another thing you can do is looking at the yarn web ui or resource manager log. It is possible that yarn killed your driver because of your usage of memory is out of limitation.
The following line of code seems consume large amount of memory. aList = [] for i in range(1000): aList.append(i**i*a) Manuel Sopena Ballesteros <manuel...@garvan.org.au> 于2019年10月9日周三 上午11:58写道: > Got it, > > > > But I still can’t see why interpreter fails, logs below: > > > > DEBUG [2019-10-09 14:48:02,193] ({pool-6-thread-2} > Interpreter.java[getProperty]:222) - key: > zeppelin.PySparkInterpreter.precode, value: null > > DEBUG [2019-10-09 14:48:02,195] ({pool-6-thread-2} > RemoteInterpreterServer.java[jobRun]:632) - Script after hooks: a = > "bigword" > > aList = [] > > for i in range(1000): > > aList.append(i**i*a) > > #print aList > > > > for word in aList: > > print word > > __zeppelin__._displayhook() > > DEBUG [2019-10-09 14:48:02,195] ({pool-6-thread-2} > RemoteInterpreterEventClient.java[sendEvent]:413) - Send Event: > RemoteInterpreterEvent(type:META_INFOS, data:{"message":"Spark UI > enabled","url":"http://r640-1-10-mlx.mlx:36423"}) > > DEBUG [2019-10-09 14:48:02,195] ({pool-5-thread-2} > RemoteInterpreterEventClient.java[pollEvent]:366) - Send event META_INFOS > > DEBUG [2019-10-09 14:48:04,720] ({Thread-33} > RemoteInterpreterServer.java[onAppend]:789) - Output Append: > /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1570490897819_0016/container_e05_1570490897819_0016_01_000001/tmp/zeppelin_pyspark-1580515882697345087.py:179: > UserWarning: Unable to load inline matplotlib backend, falling back to Agg > > > > DEBUG [2019-10-09 14:48:04,722] ({Thread-33} > RemoteInterpreterEventClient.java[sendEvent]:413) - Send Event: > RemoteInterpreterEvent(type:OUTPUT_APPEND, > data:{"data":"/d0/hadoop/yarn/local/usercache/mansop/appcache/application_1570490897819_0016/container_e05_1570490897819_0016_01_000001/tmp/zeppelin_pyspark-1580515882697345087.py:179: > UserWarning: Unable to load inline matplotlib backend, falling back to > Agg\n","index":"0","noteId":"2ENM9X82N","paragraphId":"20190926-163159_1153559848"}) > > DEBUG [2019-10-09 14:48:04,722] ({Thread-33} > RemoteInterpreterServer.java[onAppend]:789) - Output Append: > warnings.warn("Unable to load inline matplotlib backend, " > > > > DEBUG [2019-10-09 14:48:04,722] ({pool-5-thread-2} > RemoteInterpreterEventClient.java[pollEvent]:366) - Send event OUTPUT_APPEND > > DEBUG [2019-10-09 14:48:04,722] ({Thread-33} > RemoteInterpreterEventClient.java[sendEvent]:413) - Send Event: > RemoteInterpreterEvent(type:OUTPUT_APPEND, data:{"data":" > warnings.warn(\"Unable to load inline matplotlib backend, > \"\n","index":"0","noteId":"2ENM9X82N","paragraphId":"20190926-163159_1153559848"}) > > DEBUG [2019-10-09 14:48:04,723] ({pool-5-thread-2} > RemoteInterpreterEventClient.java[pollEvent]:366) - Send event OUTPUT_APPEND > > ERROR [2019-10-09 14:48:10,937] ({SIGTERM handler} > SignalUtils.scala[apply$mcZ$sp]:43) - RECEIVED SIGNAL TERM > > INFO [2019-10-09 14:48:10,981] ({shutdown-hook-0} > Logging.scala[logInfo]:54) - Invoking stop() from shutdown hook > > INFO [2019-10-09 14:48:11,002] ({shutdown-hook-0} > AbstractConnector.java[doStop]:318) - Stopped Spark@3e8aac20 > {HTTP/1.1,[http/1.1]}{0.0.0.0:0} > > INFO [2019-10-09 14:48:11,006] ({shutdown-hook-0} > Logging.scala[logInfo]:54) - Stopped Spark web UI at > http://r640-1-10-mlx.mlx:36423 > > INFO [2019-10-09 14:48:11,057] ({dispatcher-event-loop-22} > Logging.scala[logInfo]:54) - Driver requested a total number of 0 > executor(s). > > INFO [2019-10-09 14:48:11,059] ({shutdown-hook-0} > Logging.scala[logInfo]:54) - Shutting down all executors > > INFO [2019-10-09 14:48:11,061] ({dispatcher-event-loop-23} > Logging.scala[logInfo]:54) - Asking each executor to shut down > > INFO [2019-10-09 14:48:11,070] ({shutdown-hook-0} > Logging.scala[logInfo]:54) - Stopping SchedulerExtensionServices > > (serviceOption=None, > > services=List(), > > started=false) > > ERROR [2019-10-09 14:48:11,075] ({Reporter} Logging.scala[logError]:91) - > Exception from Reporter thread. > > org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: > Application attempt appattempt_1570490897819_0016_000001 doesn't exist in > ApplicationMasterService cache. > > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404) > > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:422) > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75) > > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116) > > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > > at com.sun.proxy.$Proxy21.allocate(Unknown Source) > > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:320) > > at > org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:268) > > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:556) > > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException): > Application attempt appattempt_1570490897819_0016_000001 doesn't exist in > ApplicationMasterService cache. > > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404) > > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:422) > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > > > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497) > > at org.apache.hadoop.ipc.Client.call(Client.java:1443) > > at org.apache.hadoop.ipc.Client.call(Client.java:1353) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > > at com.sun.proxy.$Proxy20.allocate(Unknown Source) > > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) > > ... 13 more > > INFO [2019-10-09 14:48:11,084] ({Reporter} Logging.scala[logInfo]:54) - > Final app status: FAILED, exitCode: 12, (reason: Application attempt > appattempt_1570490897819_0016_000001 doesn't exist in > ApplicationMasterService cache. > > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404) > > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:422) > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > ) > > INFO [2019-10-09 14:48:11,086] ({dispatcher-event-loop-33} > Logging.scala[logInfo]:54) - MapOutputTrackerMasterEndpoint stopped! > > INFO [2019-10-09 14:48:11,119] ({shutdown-hook-0} > Logging.scala[logInfo]:54) - MemoryStore cleared > > INFO [2019-10-09 14:48:11,120] ({shutdown-hook-0} > Logging.scala[logInfo]:54) - BlockManager stopped > > INFO [2019-10-09 14:48:11,133] ({shutdown-hook-0} > Logging.scala[logInfo]:54) - BlockManagerMaster stopped > > INFO [2019-10-09 14:48:11,138] ({dispatcher-event-loop-40} > Logging.scala[logInfo]:54) - OutputCommitCoordinator stopped! > > INFO [2019-10-09 14:48:11,159] ({shutdown-hook-0} > Logging.scala[logInfo]:54) - Successfully stopped SparkContext > > INFO [2019-10-09 14:48:11,163] ({shutdown-hook-0} > Logging.scala[logInfo]:54) - Shutdown hook called > > > > Manuel > > > > *From:* Jeff Zhang [mailto:zjf...@gmail.com] > *Sent:* Wednesday, October 9, 2019 1:10 PM > *To:* users > *Subject:* Re: thrift.transport.TTransportException > > > > >>> I added ` log4j.logger.org.apache.zeppelin.interpreter=DEBUG` to the > ` log4j_yarn_cluster.properties` file but nothing has changed, in fact > the ` zeppelin-interpreter-spark2-mansop-root-zama-mlx.mlx.log` file is > not updated after running my notes > > > > In yarn cluster mode, you should check yarn app log file instead of the > local log file. > > > > > > Manuel Sopena Ballesteros <manuel...@garvan.org.au> 于2019年10月9日周三 上午10:06 > 写道: > > Hi Jeff, > > > > Sorry for the late response. > > > > I ran yarn-cluster mode with this setup > > > > %spark2.conf > > > > master yarn > > spark.submit.deployMode cluster > > zeppelin.pyspark.python /home/mansop/anaconda2/bin/python > > spark.driver.memory 10g > > > > I added ` log4j.logger.org.apache.zeppelin.interpreter=DEBUG` to the ` > log4j_yarn_cluster.properties` file but nothing has changed, in fact the > ` zeppelin-interpreter-spark2-mansop-root-zama-mlx.mlx.log` file is not > updated after running my notes > > > > This code works > > > > %pyspark > > > > print("Hello world!") > > > > However this one does not work: > > > > %pyspark > > > > a = "bigword" > > aList = [] > > for i in range(1000): > > aList.append(i**i*a) > > #print aList > > > > for word in aList: > > print word > > > > which means I am still getting org.apache.thrift.transport.TTransportException > at > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) > > > > and spark logs says: > > ERROR [2019-10-09 12:15:16,454] ({SIGTERM handler} > SignalUtils.scala[apply$mcZ$sp]:43) - RECEIVED SIGNAL TERM > > … > > ERROR [2019-10-09 12:15:16,609] ({Reporter} Logging.scala[logError]:91) - > Exception from Reporter thread. > > org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: > Application attempt appattempt_1570490897819_0013_000001 doesn't exist in > ApplicationMasterService cache. > > > > Any idea? > > > > Manuel > > > > *From:* Jeff Zhang [mailto:zjf...@gmail.com] > *Sent:* Friday, October 4, 2019 5:12 PM > *To:* users > *Subject:* Re: thrift.transport.TTransportException > > > > Then it looks like something wrong with the python process. Do you run it > in yarn-cluster mode or yarn-client mode ? > > Try to add the following line to log4j.properties for yarn-client mode or > log4j_yarn_cluster.properties for yarn-cluster mode > > > > log4j.logger.org.apache.zeppelin.interpreter=DEBUG > > > > And try it again, this time you will get more log info, I suspect the > python process fail to start > > > > > > > > > > Manuel Sopena Ballesteros <manuel...@garvan.org.au> 于2019年10月4日周五 上午9:09 > 写道: > > Sorry for the late response, > > > > Yes, I have successfully ran few simple scala codes using %spark > interpreter in zeppelin. > > > > What should I do next? > > > > Manuel > > > > *From:* Jeff Zhang [mailto:zjf...@gmail.com] > *Sent:* Tuesday, October 1, 2019 5:44 PM > *To:* users > *Subject:* Re: thrift.transport.TTransportException > > > > It looks like you are using pyspark, could you try just start scala spark > interpreter via `%spark` ? First let's figure out whether it is related > with pyspark. > > > > > > > > Manuel Sopena Ballesteros <manuel...@garvan.org.au> 于2019年10月1日周二 下午3:29 > 写道: > > Dear Zeppelin community, > > > > I would like to ask for advice in regards an error I am having with thrift. > > > > I am getting quite a lot of these errors while running my notebooks > > > > org.apache.thrift.transport.TTransportException at > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at > org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) > at > org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) > at > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at > org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:274) > at > org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:258) > at > org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:233) > at > org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:229) > at > org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:135) > at > org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:228) > at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:437) at > org.apache.zeppelin.scheduler.Job.run(Job.java:188) at > org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:307) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > > > And this is the Spark driver application logs: > > … > > > =============================================================================== > > YARN executor launch context: > > env: > > CLASSPATH -> > {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>/usr/hdp/3.1.0.0-78/hadoop/*<CPS>/usr/hdp/3.1.0.0-78/hadoop/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/3.1.0.0-78/hadoop/lib/hadoop-lzo-0.6.0.3.1.0.0-78.jar:/etc/hadoop/conf/secure<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__ > > SPARK_YARN_STAGING_DIR -> > hdfs://gl-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1568954689585_0052 > > SPARK_USER -> mansop > > PYTHONPATH -> > /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/:<CPS>{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-0.10.7-src.zip > > > > command: > > > LD_LIBRARY_PATH="/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:$LD_LIBRARY_PATH" > \ > > {{JAVA_HOME}}/bin/java \ > > -server \ > > -Xmx1024m \ > > '-XX:+UseNUMA' \ > > -Djava.io.tmpdir={{PWD}}/tmp \ > > '-Dspark.history.ui.port=18081' \ > > -Dspark.yarn.app.container.log.dir=<LOG_DIR> \ > > -XX:OnOutOfMemoryError='kill %p' \ > > org.apache.spark.executor.CoarseGrainedExecutorBackend \ > > --driver-url \ > > spark://coarsegrainedschedu...@r640-1-12-mlx.mlx:35602 \ > > --executor-id \ > > <executorId> \ > > --hostname \ > > <hostname> \ > > --cores \ > > 1 \ > > --app-id \ > > application_1568954689585_0052 \ > > --user-class-path \ > > file:$PWD/__app__.jar \ > > 1><LOG_DIR>/stdout \ > > 2><LOG_DIR>/stderr > > > > resources: > > __app__.jar -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" > port: 8020 file: > "/user/mansop/.sparkStaging/application_1568954689585_0052/spark-interpreter-0.8.0.3.1.0.0-78.jar" > } size: 20433040 timestamp: 1569804142906 type: FILE visibility: PRIVATE > > __spark_conf__ -> resource { scheme: "hdfs" host: > "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: > "/user/mansop/.sparkStaging/application_1568954689585_0052/__spark_conf__.zip" > } size: 277725 timestamp: 1569804143239 type: ARCHIVE visibility: PRIVATE > > sparkr -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" > port: 8020 file: > "/user/mansop/.sparkStaging/application_1568954689585_0052/sparkr.zip" } > size: 688255 timestamp: 1569804142991 type: ARCHIVE visibility: PRIVATE > > log4j_yarn_cluster.properties -> resource { scheme: "hdfs" host: > "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: > "/user/mansop/.sparkStaging/application_1568954689585_0052/log4j_yarn_cluster.properties" > } size: 1018 timestamp: 1569804142955 type: FILE visibility: PRIVATE > > pyspark.zip -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" > port: 8020 file: > "/user/mansop/.sparkStaging/application_1568954689585_0052/pyspark.zip" } > size: 550570 timestamp: 1569804143018 type: FILE visibility: PRIVATE > > __spark_libs__ -> resource { scheme: "hdfs" host: > "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: > "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-yarn-archive.tar.gz" } size: > 280293050 timestamp: 1568938921259 type: ARCHIVE visibility: PUBLIC > > py4j-0.10.7-src.zip -> resource { scheme: "hdfs" host: > "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: > "/user/mansop/.sparkStaging/application_1568954689585_0052/py4j-0.10.7-src.zip" > } size: 42437 timestamp: 1569804143043 type: FILE visibility: PRIVATE > > __hive_libs__ -> resource { scheme: "hdfs" host: > "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: > "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-hive-archive.tar.gz" } size: > 43807162 timestamp: 1568938925069 type: ARCHIVE visibility: PUBLIC > > > > > =============================================================================== > > INFO [2019-09-30 10:42:37,303] ({main} RMProxy.java[newProxyInstance]:133) > - Connecting to ResourceManager at gl-hdp-ctrl03-mlx.mlx/10.0.1.248:8030 > > INFO [2019-09-30 10:42:37,324] ({main} Logging.scala[logInfo]:54) - > Registering the ApplicationMaster > > INFO [2019-09-30 10:42:37,454] ({main} > Configuration.java[getConfResourceAsInputStream]:2756) - found resource > resource-types.xml at file:/etc/hadoop/3.1.0.0-78/0/resource-types.xml > > INFO [2019-09-30 10:42:37,470] ({main} Logging.scala[logInfo]:54) - Will > request 2 executor container(s), each with 1 core(s) and 1408 MB memory > (including 384 MB of overhead) > > INFO [2019-09-30 10:42:37,474] ({dispatcher-event-loop-14} > Logging.scala[logInfo]:54) - ApplicationMaster registered as > NettyRpcEndpointRef(spark://yar...@r640-1-12-mlx.mlx:35602) > > INFO [2019-09-30 10:42:37,485] ({main} Logging.scala[logInfo]:54) - > Submitted 2 unlocalized container requests. > > INFO [2019-09-30 10:42:37,518] ({main} Logging.scala[logInfo]:54) - > Started progress reporter thread with (heartbeat : 3000, initial allocation > : 200) intervals > > INFO [2019-09-30 10:42:37,619] ({Reporter} Logging.scala[logInfo]:54) - > Launching container container_e01_1568954689585_0052_01_000002 on host > r640-1-12-mlx.mlx for executor with ID 1 > > INFO [2019-09-30 10:42:37,621] ({Reporter} Logging.scala[logInfo]:54) - > Launching container container_e01_1568954689585_0052_01_000003 on host > r640-1-13-mlx.mlx for executor with ID 2 > > INFO [2019-09-30 10:42:37,623] ({Reporter} Logging.scala[logInfo]:54) - > Received 2 containers from YARN, launching executors on 2 of them. > > INFO [2019-09-30 10:42:39,481] ({dispatcher-event-loop-51} > Logging.scala[logInfo]:54) - Registered executor > NettyRpcEndpointRef(spark-client://Executor) (10.0.1.12:54340) with ID 1 > > INFO [2019-09-30 10:42:39,553] ({dispatcher-event-loop-62} > Logging.scala[logInfo]:54) - Registering block manager > r640-1-12-mlx.mlx:33043 with 408.9 MB RAM, BlockManagerId(1, > r640-1-12-mlx.mlx, 33043, None) > > INFO [2019-09-30 10:42:40,003] ({dispatcher-event-loop-9} > Logging.scala[logInfo]:54) - Registered executor > NettyRpcEndpointRef(spark-client://Executor) (10.0.1.13:33812) with ID 2 > > INFO [2019-09-30 10:42:40,023] ({pool-6-thread-2} > Logging.scala[logInfo]:54) - SchedulerBackend is ready for scheduling > beginning after reached minRegisteredResourcesRatio: 0.8 > > INFO [2019-09-30 10:42:40,025] ({pool-6-thread-2} > Logging.scala[logInfo]:54) - YarnClusterScheduler.postStartHook done > > INFO [2019-09-30 10:42:40,072] ({dispatcher-event-loop-11} > Logging.scala[logInfo]:54) - Registering block manager > r640-1-13-mlx.mlx:34105 with 408.9 MB RAM, BlockManagerId(2, > r640-1-13-mlx.mlx, 34105, None) > > INFO [2019-09-30 10:42:41,779] ({pool-6-thread-2} > SparkShims.java[loadShims]:54) - Initializing shims for Spark 2.x > > INFO [2019-09-30 10:42:41,840] ({pool-6-thread-2} > Py4JUtils.java[createGatewayServer]:44) - Launching GatewayServer at > 127.0.0.1:36897 > > INFO [2019-09-30 10:42:41,852] ({pool-6-thread-2} > PySparkInterpreter.java[createGatewayServerAndStartScript]:265) - > pythonExec: /home/mansop/anaconda2/bin/python > > INFO [2019-09-30 10:42:41,862] ({pool-6-thread-2} > PySparkInterpreter.java[setupPySparkEnv]:236) - PYTHONPATH: > /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/::/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/pyspark.zip:/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/py4j-0.10.7-src.zip > > ERROR [2019-09-30 10:43:09,061] ({SIGTERM handler} > SignalUtils.scala[apply$mcZ$sp]:43) - RECEIVED SIGNAL TERM > > INFO [2019-09-30 10:43:09,068] ({shutdown-hook-0} > Logging.scala[logInfo]:54) - Invoking stop() from shutdown hook > > INFO [2019-09-30 10:43:09,082] ({shutdown-hook-0} > AbstractConnector.java[doStop]:318) - Stopped Spark@505439b3 > {HTTP/1.1,[http/1.1]}{0.0.0.0:0} > > INFO [2019-09-30 10:43:09,085] ({shutdown-hook-0} > Logging.scala[logInfo]:54) - Stopped Spark web UI at > http://r640-1-12-mlx.mlx:42446 > > INFO [2019-09-30 10:43:09,140] ({dispatcher-event-loop-52} > Logging.scala[logInfo]:54) - Driver requested a total number of 0 > executor(s). > > INFO [2019-09-30 10:43:09,142] ({shutdown-hook-0} > Logging.scala[logInfo]:54) - Shutting down all executors > > INFO [2019-09-30 10:43:09,144] ({dispatcher-event-loop-51} > Logging.scala[logInfo]:54) - Asking each executor to shut down > > INFO [2019-09-30 10:43:09,151] ({shutdown-hook-0} > Logging.scala[logInfo]:54) - Stopping SchedulerExtensionServices > > (serviceOption=None, > > services=List(), > > started=false) > > ERROR [2019-09-30 10:43:09,155] ({Reporter} Logging.scala[logError]:91) - > Exception from Reporter thread. > > org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: > Application attempt appattempt_1568954689585_0052_000001 doesn't exist in > ApplicationMasterService cache. > > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404) > > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > > at java.security.AccessController.doPrivileged(Native > Method) > > at javax.security.auth.Subject.doAs(Subject.java:422) > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > > at > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > > > at > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > > at > java.lang.reflect.Constructor.newInstance(Constructor.java:423) > > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75) > > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116) > > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > > at com.sun.proxy.$Proxy21.allocate(Unknown Source) > > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:320) > > at > org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:268) > > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:556) > > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException): > Application attempt appattempt_1568954689585_0052_000001 doesn't exist in > ApplicationMasterService cache. > > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404) > > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > > at java.security.AccessController.doPrivileged(Native > Method) > > at javax.security.auth.Subject.doAs(Subject.java:422) > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > > at > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > > > at > org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497) > > at org.apache.hadoop.ipc.Client.call(Client.java:1443) > > at org.apache.hadoop.ipc.Client.call(Client.java:1353) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > > at com.sun.proxy.$Proxy20.allocate(Unknown Source) > > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) > > ... 13 more > > INFO [2019-09-30 10:43:09,164] ({Reporter} Logging.scala[logInfo]:54) - > Final app status: FAILED, exitCode: 12, (reason: Application attempt > appattempt_1568954689585_0052_000001 doesn't exist in > ApplicationMasterService cache. > > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404) > > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > > at java.security.AccessController.doPrivileged(Native > Method) > > at javax.security.auth.Subject.doAs(Subject.java:422) > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > > at > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > ) > > INFO [2019-09-30 10:43:09,166] ({dispatcher-event-loop-54} > Logging.scala[logInfo]:54) - MapOutputTrackerMasterEndpoint stopped! > > INFO [2019-09-30 10:43:09,236] ({shutdown-hook-0} > Logging.scala[logInfo]:54) - MemoryStore cleared > > INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0} > Logging.scala[logInfo]:54) - BlockManager stopped > > INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0} > Logging.scala[logInfo]:54) - BlockManagerMaster stopped > > INFO [2019-09-30 10:43:09,241] ({dispatcher-event-loop-73} > Logging.scala[logInfo]:54) - OutputCommitCoordinator stopped! > > INFO [2019-09-30 10:43:09,252] ({shutdown-hook-0} > Logging.scala[logInfo]:54) - Successfully stopped SparkContext > > INFO [2019-09-30 10:43:09,253] ({shutdown-hook-0} > Logging.scala[logInfo]:54) - Shutdown hook called > > INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0} > Logging.scala[logInfo]:54) - Deleting directory > /d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-ba80cda3-812a-4cf0-b1f6-6e9eb52952b2 > > INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0} > Logging.scala[logInfo]:54) - Deleting directory > /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8 > > INFO [2019-09-30 10:43:09,255] ({shutdown-hook-0} > Logging.scala[logInfo]:54) - Deleting directory > /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8/pyspark-9138f7ad-3f15-42c6-9bf3-e3e72d5d4086 > > > > How can I continue troubleshooting in order to find out what this error > means? > > > > Thank you very much > > > > NOTICE > > Please consider the environment before printing this email. This message > and any attachments are intended for the addressee named and may contain > legally privileged/confidential/copyright information. If you are not the > intended recipient, you should not read, use, disclose, copy or distribute > this communication. If you have received this message in error please > notify us at once by return email and then delete both messages. We accept > no liability for the distribution of viruses or similar in electronic > communications. This notice should not be removed. > > > > > -- > > Best Regards > > Jeff Zhang > > NOTICE > > Please consider the environment before printing this email. This message > and any attachments are intended for the addressee named and may contain > legally privileged/confidential/copyright information. If you are not the > intended recipient, you should not read, use, disclose, copy or distribute > this communication. If you have received this message in error please > notify us at once by return email and then delete both messages. We accept > no liability for the distribution of viruses or similar in electronic > communications. This notice should not be removed. > > > > > -- > > Best Regards > > Jeff Zhang > > NOTICE > > Please consider the environment before printing this email. This message > and any attachments are intended for the addressee named and may contain > legally privileged/confidential/copyright information. If you are not the > intended recipient, you should not read, use, disclose, copy or distribute > this communication. If you have received this message in error please > notify us at once by return email and then delete both messages. We accept > no liability for the distribution of viruses or similar in electronic > communications. This notice should not be removed. > > > > > -- > > Best Regards > > Jeff Zhang > NOTICE > Please consider the environment before printing this email. This message > and any attachments are intended for the addressee named and may contain > legally privileged/confidential/copyright information. If you are not the > intended recipient, you should not read, use, disclose, copy or distribute > this communication. If you have received this message in error please > notify us at once by return email and then delete both messages. We accept > no liability for the distribution of viruses or similar in electronic > communications. This notice should not be removed. > -- Best Regards Jeff Zhang