From: James Srinivasan <james.sriniva...@gmail.com> Sent: 01 October 2019 09:26 To: users@zeppelin.apache.org Subject: Re: thrift.transport.TTransportException
I'm guessing you might have conflicting versions of libthrift on your classpath On Tue, 1 Oct 2019, 08:44 Jeff Zhang, <zjf...@gmail.com<mailto:zjf...@gmail.com>> wrote: It looks like you are using pyspark, could you try just start scala spark interpreter via `%spark` ? First let's figure out whether it is related with pyspark. Manuel Sopena Ballesteros <manuel...@garvan.org.au<mailto:manuel...@garvan.org.au>> 于2019年10月1日周二 下午3:29写道: Dear Zeppelin community, I would like to ask for advice in regards an error I am having with thrift. I am getting quite a lot of these errors while running my notebooks org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:274) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:258) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:233) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:229) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:135) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:228) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:437) at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:307) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) And this is the Spark driver application logs: … =============================================================================== YARN executor launch context: env: CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>/usr/hdp/3.1.0.0-78/hadoop/*<CPS>/usr/hdp/3.1.0.0-78/hadoop/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/3.1.0.0-78/hadoop/lib/hadoop-lzo-0.6.0.3.1.0.0-78.jar:/etc/hadoop/conf/secure<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__ SPARK_YARN_STAGING_DIR -> hdfs://gl-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1568954689585_0052 SPARK_USER -> mansop PYTHONPATH -> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/:<CPS>{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-0.10.7-src.zip command: LD_LIBRARY_PATH="/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:$LD_LIBRARY_PATH" \ {{JAVA_HOME}}/bin/java \ -server \ -Xmx1024m \ '-XX:+UseNUMA' \ -Djava.io.tmpdir={{PWD}}/tmp \ '-Dspark.history.ui.port=18081' \ -Dspark.yarn.app.container.log.dir=<LOG_DIR> \ -XX:OnOutOfMemoryError='kill %p' \ org.apache.spark.executor.CoarseGrainedExecutorBackend \ --driver-url \ spark://coarsegrainedschedu...@r640-1-12-mlx.mlx:35602 \ --executor-id \ <executorId> \ --hostname \ <hostname> \ --cores \ 1 \ --app-id \ application_1568954689585_0052 \ --user-class-path \ file:$PWD/__app__.jar \ 1><LOG_DIR>/stdout \ 2><LOG_DIR>/stderr resources: __app__.jar -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/spark-interpreter-0.8.0.3.1.0.0-78.jar" } size: 20433040 timestamp: 1569804142906 type: FILE visibility: PRIVATE __spark_conf__ -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/__spark_conf__.zip" } size: 277725 timestamp: 1569804143239 type: ARCHIVE visibility: PRIVATE sparkr -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/sparkr.zip" } size: 688255 timestamp: 1569804142991 type: ARCHIVE visibility: PRIVATE log4j_yarn_cluster.properties -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/log4j_yarn_cluster.properties" } size: 1018 timestamp: 1569804142955 type: FILE visibility: PRIVATE pyspark.zip -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/pyspark.zip" } size: 550570 timestamp: 1569804143018 type: FILE visibility: PRIVATE __spark_libs__ -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-yarn-archive.tar.gz" } size: 280293050 timestamp: 1568938921259 type: ARCHIVE visibility: PUBLIC py4j-0.10.7-src.zip -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/user/mansop/.sparkStaging/application_1568954689585_0052/py4j-0.10.7-src.zip" } size: 42437 timestamp: 1569804143043 type: FILE visibility: PRIVATE __hive_libs__ -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port: 8020 file: "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-hive-archive.tar.gz" } size: 43807162 timestamp: 1568938925069 type: ARCHIVE visibility: PUBLIC =============================================================================== INFO [2019-09-30 10:42:37,303] ({main} RMProxy.java[newProxyInstance]:133) - Connecting to ResourceManager at gl-hdp-ctrl03-mlx.mlx/10.0.1.248:8030<http://10.0.1.248:8030> INFO [2019-09-30 10:42:37,324] ({main} Logging.scala[logInfo]:54) - Registering the ApplicationMaster INFO [2019-09-30 10:42:37,454] ({main} Configuration.java[getConfResourceAsInputStream]:2756) - found resource resource-types.xml at file:/etc/hadoop/3.1.0.0-78/0/resource-types.xml INFO [2019-09-30 10:42:37,470] ({main} Logging.scala[logInfo]:54) - Will request 2 executor container(s), each with 1 core(s) and 1408 MB memory (including 384 MB of overhead) INFO [2019-09-30 10:42:37,474] ({dispatcher-event-loop-14} Logging.scala[logInfo]:54) - ApplicationMaster registered as NettyRpcEndpointRef(spark://yar...@r640-1-12-mlx.mlx:35602) INFO [2019-09-30 10:42:37,485] ({main} Logging.scala[logInfo]:54) - Submitted 2 unlocalized container requests. INFO [2019-09-30 10:42:37,518] ({main} Logging.scala[logInfo]:54) - Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals INFO [2019-09-30 10:42:37,619] ({Reporter} Logging.scala[logInfo]:54) - Launching container container_e01_1568954689585_0052_01_000002 on host r640-1-12-mlx.mlx for executor with ID 1 INFO [2019-09-30 10:42:37,621] ({Reporter} Logging.scala[logInfo]:54) - Launching container container_e01_1568954689585_0052_01_000003 on host r640-1-13-mlx.mlx for executor with ID 2 INFO [2019-09-30 10:42:37,623] ({Reporter} Logging.scala[logInfo]:54) - Received 2 containers from YARN, launching executors on 2 of them. INFO [2019-09-30 10:42:39,481] ({dispatcher-event-loop-51} Logging.scala[logInfo]:54) - Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.0.1.12:54340<http://10.0.1.12:54340>) with ID 1 INFO [2019-09-30 10:42:39,553] ({dispatcher-event-loop-62} Logging.scala[logInfo]:54) - Registering block manager r640-1-12-mlx.mlx:33043 with 408.9 MB RAM, BlockManagerId(1, r640-1-12-mlx.mlx, 33043, None) INFO [2019-09-30 10:42:40,003] ({dispatcher-event-loop-9} Logging.scala[logInfo]:54) - Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.0.1.13:33812<http://10.0.1.13:33812>) with ID 2 INFO [2019-09-30 10:42:40,023] ({pool-6-thread-2} Logging.scala[logInfo]:54) - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 INFO [2019-09-30 10:42:40,025] ({pool-6-thread-2} Logging.scala[logInfo]:54) - YarnClusterScheduler.postStartHook done INFO [2019-09-30 10:42:40,072] ({dispatcher-event-loop-11} Logging.scala[logInfo]:54) - Registering block manager r640-1-13-mlx.mlx:34105 with 408.9 MB RAM, BlockManagerId(2, r640-1-13-mlx.mlx, 34105, None) INFO [2019-09-30 10:42:41,779] ({pool-6-thread-2} SparkShims.java[loadShims]:54) - Initializing shims for Spark 2.x INFO [2019-09-30 10:42:41,840] ({pool-6-thread-2} Py4JUtils.java[createGatewayServer]:44) - Launching GatewayServer at 127.0.0.1:36897<http://127.0.0.1:36897> INFO [2019-09-30 10:42:41,852] ({pool-6-thread-2} PySparkInterpreter.java[createGatewayServerAndStartScript]:265) - pythonExec: /home/mansop/anaconda2/bin/python INFO [2019-09-30 10:42:41,862] ({pool-6-thread-2} PySparkInterpreter.java[setupPySparkEnv]:236) - PYTHONPATH: /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/::/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/pyspark.zip:/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/py4j-0.10.7-src.zip ERROR [2019-09-30 10:43:09,061] ({SIGTERM handler} SignalUtils.scala[apply$mcZ$sp]:43) - RECEIVED SIGNAL TERM INFO [2019-09-30 10:43:09,068] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Invoking stop() from shutdown hook INFO [2019-09-30 10:43:09,082] ({shutdown-hook-0} AbstractConnector.java[doStop]:318) - Stopped Spark@505439b3{HTTP/1.1,[http/1.1]}{0.0.0.0:0<http://0.0.0.0:0>} INFO [2019-09-30 10:43:09,085] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Stopped Spark web UI at http://r640-1-12-mlx.mlx:42446 INFO [2019-09-30 10:43:09,140] ({dispatcher-event-loop-52} Logging.scala[logInfo]:54) - Driver requested a total number of 0 executor(s). INFO [2019-09-30 10:43:09,142] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Shutting down all executors INFO [2019-09-30 10:43:09,144] ({dispatcher-event-loop-51} Logging.scala[logInfo]:54) - Asking each executor to shut down INFO [2019-09-30 10:43:09,151] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) ERROR [2019-09-30 10:43:09,155] ({Reporter} Logging.scala[logError]:91) - Exception from Reporter thread. org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: Application attempt appattempt_1568954689585_0052_000001 doesn't exist in ApplicationMasterService cache. at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy21.allocate(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:320) at org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:268) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:556) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException): Application attempt appattempt_1568954689585_0052_000001 doesn't exist in ApplicationMasterService cache. at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497) at org.apache.hadoop.ipc.Client.call(Client.java:1443) at org.apache.hadoop.ipc.Client.call(Client.java:1353) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy20.allocate(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) ... 13 more INFO [2019-09-30 10:43:09,164] ({Reporter} Logging.scala[logInfo]:54) - Final app status: FAILED, exitCode: 12, (reason: Application attempt appattempt_1568954689585_0052_000001 doesn't exist in ApplicationMasterService cache. at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) ) INFO [2019-09-30 10:43:09,166] ({dispatcher-event-loop-54} Logging.scala[logInfo]:54) - MapOutputTrackerMasterEndpoint stopped! INFO [2019-09-30 10:43:09,236] ({shutdown-hook-0} Logging.scala[logInfo]:54) - MemoryStore cleared INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0} Logging.scala[logInfo]:54) - BlockManager stopped INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0} Logging.scala[logInfo]:54) - BlockManagerMaster stopped INFO [2019-09-30 10:43:09,241] ({dispatcher-event-loop-73} Logging.scala[logInfo]:54) - OutputCommitCoordinator stopped! INFO [2019-09-30 10:43:09,252] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Successfully stopped SparkContext INFO [2019-09-30 10:43:09,253] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Shutdown hook called INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Deleting directory /d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-ba80cda3-812a-4cf0-b1f6-6e9eb52952b2 INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Deleting directory /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8 INFO [2019-09-30 10:43:09,255] ({shutdown-hook-0} Logging.scala[logInfo]:54) - Deleting directory /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8/pyspark-9138f7ad-3f15-42c6-9bf3-e3e72d5d4086 How can I continue troubleshooting in order to find out what this error means? Thank you very much NOTICE Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed. -- Best Regards Jeff Zhang