Yes, that would be a solution if I could get the impersonation working. We need to control the permissions based on sentry and not based on the service account's keytab file.
On Sat, 4 Aug 2018 at 04:08, Jeff Zhang <zjf...@gmail.com> wrote: > > Could you try to set keytab and principal ? > > spark.yarn.keytab > spark.yarn.principal > > András Kolbert <kolbertand...@gmail.com>于2018年8月3日周五 下午4:38写道: > >> I do not have keytab added to my Spark interpreter. I use kinit at the >> moment. echo 'pass' | kinit username >> >> >> master yarn-client >> spark.app.name Zeppelin >> spark.cores.max >> spark.dynamicAllocation.initialExecutors 1 >> spark.dynamicAllocation.maxExecutors 15 >> spark.dynamicAllocation.minExecutors 1 >> spark.executor.memory 6g >> zeppelin.R.cmd R >> zeppelin.R.image.width 100% >> zeppelin.R.knitr true >> zeppelin.R.render.options out.format = 'html', comment = NA, echo = >> FALSE, results = 'asis', message = F, warning = F >> zeppelin.dep.additionalRemoteRepository spark-packages, >> http://dl.bintray.com/spark-packages/maven,false; >> zeppelin.dep.localrepo spark.home >> zeppelin.interpreter.localRepo >> /export/zeppelin/zeppelin-current/local-repo/2DCG17BKA >> zeppelin.interpreter.output.limit 102400 >> zeppelin.pyspark.python python >> zeppelin.pyspark.useIPython true >> zeppelin.spark.concurrentSQL false >> zeppelin.spark.enableSupportedVersionCheck true >> zeppelin.spark.importImplicit true >> zeppelin.spark.maxResult 1000 >> zeppelin.spark.printREPLOutput true >> zeppelin.spark.sql.interpolation false >> zeppelin.spark.sql.stacktrace false >> zeppelin.spark.uiWebUrl >> zeppelin.spark.useHiveContext true >> zeppelin.spark.useNew true >> >> On Fri, 3 Aug 2018 at 10:34, Jeff Zhang <zjf...@gmail.com> wrote: >> >>> >>> what is your spark interpreter configuration ? >>> >>> >>> András Kolbert <kolbertand...@gmail.com>于2018年8月3日周五 下午4:24写道: >>> >>>> Hi, >>>> >>>> We are experiencing issues with the Spark interpreter and Zeppelin's >>>> behaviour. Whenever we launch a note and we do not have a valid kerberos >>>> ticket, the application keeps trying to authenticate and does not time out. >>>> In this state, the interpreter cannot be restarted, the note cannot be >>>> cancelled. Only a whole application restart works. >>>> >>>> Is there any way around how to kill a particular execution, to get out >>>> from this loop? >>>> Thanks >>>> Andras >>>> >>>> >>>> >>>> log: >>>> >>>> >>>> >>>> INFO [2018-08-02 10:09:01,563] ({pool-2-thread-3} >>>> ConfiguredRMFailoverProxyProvider.java[performFailover]:100) - Failing over >>>> to rm119 >>>> WARN [2018-08-02 10:09:01,567] ({pool-2-thread-3} >>>> UserGroupInformation.java[doAs]:1920) - PriviledgedActionException >>>> as:zeppelin (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS >>>> initiate failed [Caused by GSSException: No valid credentials provided >>>> (Mechanism level: Failed to find any Kerberos tgt)] >>>> WARN [2018-08-02 10:09:01,567] ({pool-2-thread-3} >>>> Client.java[run]:713) - Exception encountered while connecting to the >>>> server : javax.security.sasl.SaslException: GSS initiate failed [Caused by >>>> GSSException: No valid credentials provided (Mechanism level: Failed to >>>> find any Kerberos tgt)] >>>> WARN [2018-08-02 10:09:01,568] ({pool-2-thread-3} >>>> UserGroupInformation.java[doAs]:1920) - PriviledgedActionException >>>> as:zeppelin (auth:KERBEROS) cause:java.io.IOException: >>>> javax.security.sasl.SaslException: GSS initiate failed [Caused by >>>> GSSException: No valid credentials provided (Mechanism level: Failed to >>>> find any Kerberos tgt)] >>>> INFO [2018-08-02 10:09:01,570] ({pool-2-thread-3} >>>> RetryInvocationHandler.java[invoke]:150) - Exception while invoking >>>> getClusterMetrics of class ApplicationClientProtocolPBClientImpl over rm119 >>>> after 3293 fail over attempts. Trying to fail over immediately. >>>> java.io.IOException: Failed on local exception: java.io.IOException: >>>> javax.security.sasl.SaslException: GSS initiate failed [Caused by >>>> GSSException: No valid credentials provided (Mechanism level: Failed to >>>> find any Kerberos tgt)]; Host Details : local host is: " >>>> host.com/1.1.1.1"; destination host is: "host.com":8032; >>>> at >>>> org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) >>>> at org.apache.hadoop.ipc.Client.call(Client.java:1508) >>>> at org.apache.hadoop.ipc.Client.call(Client.java:1441) >>>> at >>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) >>>> at com.sun.proxy.$Proxy17.getClusterMetrics(Unknown Source) >>>> at >>>> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:202) >>>> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) >>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>> at java.lang.reflect.Method.invoke(Method.java:497) >>>> at >>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258) >>>> at >>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) >>>> at com.sun.proxy.$Proxy18.getClusterMetrics(Unknown Source) >>>> at >>>> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:483) >>>> at >>>> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:158) >>>> at >>>> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:158) >>>> at >>>> org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) >>>> at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:61) >>>> at >>>> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:157) >>>> at >>>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) >>>> at >>>> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:165) >>>> at org.apache.spark.SparkContext.<init>(SparkContext.scala:512) >>>> at >>>> org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2511) >>>> at >>>> org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909) >>>> at >>>> org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901) >>>> at scala.Option.getOrElse(Option.scala:121) >>>> at >>>> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>> at java.lang.reflect.Method.invoke(Method.java:497) >>>> at >>>> org.apache.zeppelin.spark.BaseSparkScalaInterpreter.spark2CreateContext(BaseSparkScalaInterpreter.scala:189) >>>> at >>>> org.apache.zeppelin.spark.BaseSparkScalaInterpreter.createSparkContext(BaseSparkScalaInterpreter.scala:124) >>>> at >>>> org.apache.zeppelin.spark.SparkScala211Interpreter.open(SparkScala211Interpreter.scala:87) >>>> at >>>> org.apache.zeppelin.spark.NewSparkInterpreter.open(NewSparkInterpreter.java:102) >>>> at >>>> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:62) >>>> at >>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69) >>>> at >>>> org.apache.zeppelin.spark.PySparkInterpreter.getSparkInterpreter(PySparkInterpreter.java:664) >>>> at >>>> org.apache.zeppelin.spark.PySparkInterpreter.createGatewayServerAndStartScript(PySparkInterpreter.java:260) >>>> at >>>> org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:194) >>>> at >>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69) >>>> at >>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617) >>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:188) >>>> at >>>> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140) >>>> at >>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>> at >>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) >>>> at >>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>>> at java.lang.Thread.run(Thread.java:745) >>>> Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS >>>> initiate failed [Caused by GSSException: No valid credentials provided >>>> (Mechanism level: Failed to find any Kerberos tgt)] >>>> at >>>> org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:718) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at javax.security.auth.Subject.doAs(Subject.java:422) >>>> at >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) >>>> at >>>> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:681) >>>> at >>>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:769) >>>> at >>>> org.apache.hadoop.ipc.Client$Connection.access$3000(Client.java:396) >>>> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1557) >>>> at org.apache.hadoop.ipc.Client.call(Client.java:1480) >>>> ... 48 more >>>> Caused by: javax.security.sasl.SaslException: GSS initiate failed >>>> [Caused by GSSException: No valid credentials provided (Mechanism level: >>>> Failed to find any Kerberos tgt)] >>>> at >>>> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) >>>> at >>>> org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:413) >>>> at >>>> org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:594) >>>> at >>>> org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:396) >>>> at >>>> org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:761) >>>> at >>>> org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:757) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at javax.security.auth.Subject.doAs(Subject.java:422) >>>> at >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) >>>> at >>>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:756) >>>> ... 51 more >>>> Caused by: GSSException: No valid credentials provided (Mechanism >>>> level: Failed to find any Kerberos tgt) >>>> at >>>> sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147) >>>> at >>>> sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122) >>>> at >>>> sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187) >>>> at >>>> sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224) >>>> at >>>> sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212) >>>> at >>>> sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) >>>> at >>>> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192) >>>> ... 60 more >>>> INFO [2018-08-02 10:09:01,572] ({pool-2-thread-3} >>>> ConfiguredRMFailoverProxyProvider.java[performFailover]:100) - Failing over >>>> to rm112 >>>> INFO [2018-08-02 10:09:01,574] ({pool-2-thread-3} >>>> RetryInvocationHandler.java[invoke]:150) - Exception while invoking >>>> getClusterMetrics of class ApplicationClientProtocolPBClientImpl over rm112 >>>> after 3294 fail over attempts. Trying to fail over after sleeping for >>>> 1903ms. >>>> java.net.ConnectException: Call From host.com/1.1.1.1 to host.com:8032 >>>> failed on connection exception: java.net.ConnectException: Connection >>>> refused; For more details see: >>>> http://wiki.apache.org/hadoop/ConnectionRefused >>>> at >>>> sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown Source) >>>> at >>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >>>> at >>>> java.lang.reflect.Constructor.newInstance(Constructor.java:422) >>>> at >>>> org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) >>>> at >>>> org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) >>>> at org.apache.hadoop.ipc.Client.call(Client.java:1508) >>>> at org.apache.hadoop.ipc.Client.call(Client.java:1441) >>>> at >>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) >>>> at com.sun.proxy.$Proxy17.getClusterMetrics(Unknown Source) >>>> at >>>> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:202) >>>> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) >>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>> at java.lang.reflect.Method.invoke(Method.java:497) >>>> at >>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258) >>>> at >>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) >>>> at com.sun.proxy.$Proxy18.getClusterMetrics(Unknown Source) >>>> at >>>> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:483) >>>> at >>>> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:158) >>>> at >>>> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:158) >>>> at >>>> org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) >>>> at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:61) >>>> at >>>> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:157) >>>> at >>>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) >>>> at >>>> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:165) >>>> at org.apache.spark.SparkContext.<init>(SparkContext.scala:512) >>>> at >>>> org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2511) >>>> at >>>> org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909) >>>> at >>>> org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901) >>>> at scala.Option.getOrElse(Option.scala:121) >>>> at >>>> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>> at java.lang.reflect.Method.invoke(Method.java:497) >>>> at >>>> org.apache.zeppelin.spark.BaseSparkScalaInterpreter.spark2CreateContext(BaseSparkScalaInterpreter.scala:189) >>>> at >>>> org.apache.zeppelin.spark.BaseSparkScalaInterpreter.createSparkContext(BaseSparkScalaInterpreter.scala:124) >>>> at >>>> org.apache.zeppelin.spark.SparkScala211Interpreter.open(SparkScala211Interpreter.scala:87) >>>> at >>>> org.apache.zeppelin.spark.NewSparkInterpreter.open(NewSparkInterpreter.java:102) >>>> at >>>> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:62) >>>> at >>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69) >>>> at >>>> org.apache.zeppelin.spark.PySparkInterpreter.getSparkInterpreter(PySparkInterpreter.java:664) >>>> at >>>> org.apache.zeppelin.spark.PySparkInterpreter.createGatewayServerAndStartScript(PySparkInterpreter.java:260) >>>> at >>>> org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:194) >>>> at >>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69) >>>> at >>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617) >>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:188) >>>> at >>>> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140) >>>> at >>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>> at >>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) >>>> at >>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>>> at java.lang.Thread.run(Thread.java:745) >>>> Caused by: java.net.ConnectException: Connection refused >>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>> at >>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) >>>> at >>>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) >>>> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) >>>> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494) >>>> at >>>> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:648) >>>> at >>>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:744) >>>> at >>>> org.apache.hadoop.ipc.Client$Connection.access$3000(Client.java:396) >>>> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1557) >>>> at org.apache.hadoop.ipc.Client.call(Client.java:1480) >>>> ... 48 more >>>> >>>