Well, that wasn't a long-running session. HDFS Namenode states haven't changed when that Zeppelin notebook started. It's always reproducible problem. It might be a Spark 2.0 problem. I am failing back to Spark 1.6.
Thanks Felix. -- Ruslan Dautkhanov On Wed, Nov 23, 2016 at 6:58 AM, Felix Cheung <felixcheun...@hotmail.com> wrote: > Quite possibly since Spark is talking to HDFS. > > Does it work in your environment when HA switch over with a long running > spark shell session? > > > ------------------------------ > *From:* Ruslan Dautkhanov <dautkha...@gmail.com> > *Sent:* Sunday, November 20, 2016 5:27:54 PM > *To:* users@zeppelin.apache.org > *Subject:* Re: Zepelin problem in HA HDFS > > When I failed over HDFS HA nameservice to another namenode, Zeppelin now > has the same > error stack *but* for the other namenode, which now became standby. > > Not sure if has something to do with Spark 2.0.. > > > > -- > Ruslan Dautkhanov > > On Sun, Nov 20, 2016 at 4:59 PM, Ruslan Dautkhanov <dautkha...@gmail.com> > wrote: > >> Running into issues with Zeppelin in a cluster that runs HA HDFS. >> See complete exception stack [1]. >> "pc1udatahad01.x.y/10.20.32.54:8020... >> category READ is not supported in state standby" >> Yes, pc1udatahad01 is a current standby, why Spark/HMS/doesn't switch >> over to the active one? >> hdfs-site.xml that exists in zeppelin home/conf has a symlink >> hdfs-site.xml -> /etc/hive/conf/hdfs-site.xml >> and hdfs config properly points to a HA HDFS namespace. >> >> Thoughts? >> >> Interesting side effect is that HMS switches to a local Derby database (I >> sent email on this last week in a separate email chain). See [1] stack - it >> seems Hive/HMS tries to talk to HDFS and fails over to a local Derby >> database. >> >> >> >> Zeppelin 0.6.2 >> Spark 2.0.2 >> Hive 1.1 >> RHEL 6.6 >> Java 7 >> >> >> >> [1] >> >> INFO [2016-11-20 16:47:21,044] ({Thread-40} >> RetryInvocationHandler.java[invoke]:148) - Exception while invoking >> getFileInfo of class ClientNamenodeProtocolTranslatorPB over >> pc1udatahad01.x.y/10.20.32.54:8020. Trying to fail over immediately. >> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): >> Operation category READ is not supported in state standby. Visit >> https://s.apache.org/sbnn-error >> at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.check >> Operation(StandbyState.java:88) >> at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHACo >> ntext.checkOperation(NameNode.java:1831) >> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOpe >> ration(FSNamesystem.java:1449) >> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileI >> nfo(FSNamesystem.java:4271) >> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.get >> FileInfo(NameNodeRpcServer.java:897) >> at org.apache.hadoop.hdfs.server.namenode.AuthorizationProvider >> ProxyClientProtocol.getFileInfo(AuthorizationProvi >> derProxyClientProtocol.java:528) >> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServ >> erSideTranslatorPB.getFileInfo(ClientNamenodeProt >> ocolServerSideTranslatorPB.java:829) >> at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocol >> Protos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNam >> enodeProtocolProtos.java) >> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcIn >> voker.call(ProtobufRpcEngine.java:617) >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro >> upInformation.java:1709) >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) >> >> at org.apache.hadoop.ipc.Client.call(Client.java:1472) >> at org.apache.hadoop.ipc.Client.call(Client.java:1409) >> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke( >> ProtobufRpcEngine.java:230) >> at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source) >> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTran >> slatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:762) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >> ssorImpl.java:57) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >> thodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMeth >> od(RetryInvocationHandler.java:256) >> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Ret >> ryInvocationHandler.java:104) >> at com.sun.proxy.$Proxy17.getFileInfo(Unknown Source) >> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java: >> 2121) >> at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall( >> DistributedFileSystem.java:1215) >> at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall( >> DistributedFileSystem.java:1211) >> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSyst >> emLinkResolver.java:81) >> at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(D >> istributedFileSystem.java:1211) >> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1412) >> at org.apache.hadoop.hive.ql.session.SessionState.createRootHDF >> SDir(SessionState.java:616) >> at org.apache.hadoop.hive.ql.session.SessionState.createSession >> Dirs(SessionState.java:574) >> at org.apache.hadoop.hive.ql.session.SessionState.start(Session >> State.java:518) >> at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveC >> lientImpl.scala:189) >> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >> Method) >> at sun.reflect.NativeConstructorAccessorImpl.newInstance(Native >> ConstructorAccessorImpl.java:57) >> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(De >> legatingConstructorAccessorImpl.java:45) >> at java.lang.reflect.Constructor.newInstance(Constructor.java:5 >> 26) >> at org.apache.spark.sql.hive.client.IsolatedClientLoader.create >> Client(IsolatedClientLoader.scala:264) >> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(Hi >> veUtils.scala:354) >> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(Hi >> veUtils.scala:258) >> at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzyco >> mpute(HiveSharedState.scala:39) >> at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveS >> haredState.scala:38) >> at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$ >> lzycompute(HiveSharedState.scala:46) >> at org.apache.spark.sql.hive.HiveSharedState.externalCatalog(Hi >> veSharedState.scala:45) >> at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycomput >> e(HiveSessionState.scala:50) >> at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessi >> onState.scala:48) >> at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>( >> HiveSessionState.scala:63) >> at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompu >> te(HiveSessionState.scala:63) >> at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSess >> ionState.scala:62) >> at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed >> (QueryExecution.scala:49) >> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) >> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >> ssorImpl.java:57) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >> thodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) >> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine. >> java:357) >> at py4j.Gateway.invoke(Gateway.java:280) >> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.j >> ava:132) >> at py4j.commands.CallCommand.execute(CallCommand.java:79) >> at py4j.GatewayConnection.run(GatewayConnection.java:214) >> at java.lang.Thread.run(Thread.java:745) >> >> >> -- >> Ruslan Dautkhanov >> > >