Supreeth Sharma created ZEPPELIN-3114: -----------------------------------------
Summary: Notebooks and interpreters are not getting saved in zeppelin after >1d stress testing Key: ZEPPELIN-3114 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3114 Project: Zeppelin Issue Type: Bug Components: zeppelin-server Affects Versions: 0.7.3 Reporter: Supreeth Sharma Scenario: 36 hour long test 14 node secured encrypted cluster (centos7 based) simulated load of around 13 users running a set of 19 notebooks periodically as per defined schedule After 24 hours zeppelin stopped functioning. Issue 1 : Not able to create new notebook or update existing one. Issue 2: Not able to modify interpreter settings. Save action never gets completed on UI. Issue 3: Not able to run paragraphs. Seeing below error in zeppelin logs : {code} WARN [2017-12-19 13:18:48,128] ({qtp1076835071-86681} Client.java[run]:715) - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] INFO [2017-12-19 13:18:48,128] ({qtp1076835071-86681} RetryInvocationHandler.java[log]:280) - java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "ctr-e136-1513029738776-12293-01-000004.hwx.site/172.27.22.148"; destination host is: "ctr-e136-1513029738776-12293-01-000004.hwx.site":8020; , while invoking ClientNamenodeProtocolTranslatorPB.create over ctr-e136-1513029738776-12293-01-000004.hwx.site/172.27.22.148:8020 after 12 failover attempts. Trying to failover after sleeping for 15905ms. {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)