Paul, Ben, Fabian, please share your workload at time when notes are not loading.
How much interpreters were started at that moment? You can find all started interpreters in linux command line with: ps -ef | grep ZeppelinServer | grep -v grep | awk "NR==1" | awk -F' ' '{print $2}' | xargs ps -f --ppid | wc -l And spark started interpreters: ps -ef | grep ZeppelinServer | grep -v grep | awk "NR==1" | awk -F' ' '{print $2}' | xargs ps -f --ppid | grep spark | wc -l Максим Белоусов Архитектор Отдел отчетности и витрин данных Управление хранилищ данных и отчетности Тел.: +7 495 648-10-00, доб. 2271 From: Paul Brenner [mailto:pbren...@placeiq.com] Sent: Thursday, October 12, 2017 7:45 PM To: Geoffrey Cheng <geoffrey.ch...@gmail.com>; users@zeppelin.apache.org Subject: Re: Zeppelin Stops Loading Notes [https://share.polymail.io/v2/z/a/NTlkZjliYjE3MmFm/GBrDc33gW4GiZNTZBS_xV13grE-GeWLFw0ScDUHMwFNKw_JiWRWwck1mDOMthhhFuKERg2yT-GFU9L4aTwWhM7XWYY-r6DJGKBzEDe3VE2RfpzddnNRkp_Zpnb_TXVItyGxSxTvIer3-RVJ3GroFVEMhWGg=.png] Does this issue need a Jira ticket? The problem is that I have no idea how to reproduce and I’m not sure if there is anything in the logs that is relevant. Any ideas how we can produce an actionable Jira ticket out of this? [https://ci3.googleusercontent.com/proxy/tFn1I-GEOnccUtv8DHHEc49-6g3x3CbuQKzbfl2Z1BObEy0Qz6QebJimpP96TK3Za5MXwXTuwBZaobKp22nYAG3NdxAC0Q=s0-d-e1-ft#https://marketing.placeiq.net/images/placeiq.png]<http://www.placeiq.com/> Paul Brenner [https://ci4.googleusercontent.com/proxy/490PXYv9O6OiIp_DL4vuabJqVn53fMon5xNYZdftCVea9ySR2LcFDHe6Cdntb2G68uDAuA6FgLny8wKWLFWpsrPAt_FtLaE=s0-d-e1-ft#https://marketing.placeiq.net/images/twitter1.png]<https://twitter.com/placeiq> [https://ci3.googleusercontent.com/proxy/fztHf1lRKLQYcAxebqfp2PYXCwVap3GobHVIbyp0j3NcuJOY16bUAZBibVOFf-fd1GsiuhrOfYy6dSwhlCwWU8ZUlw9OX5I=s0-d-e1-ft#https://marketing.placeiq.net/images/facebook.png]<https://www.facebook.com/PlaceIQ> [https://ci5.googleusercontent.com/proxy/H26ThD7R6DOqxoLTgzi6k5SMrHoF2Tj44xI_7XlD9KfOIiGwe1WIMc5iQBxUBA9EuIyJMdaRXrhZTOrnkrn8O9Rf1FP9UQU=s0-d-e1-ft#https://marketing.placeiq.net/images/linkedin.png]<https://www.linkedin.com/company/placeiq> DATA SCIENTIST (217) 390-3033 [PlaceIQ:Landmark by PlaceIQ]<https://www.placeiq.com/2017/05/placeiqs-landmark-powers-location-based-insight-innovation-for-ansible-gstv-havas-media-the-media-kitchen-and-more/> On Thu, Oct 12, 2017 at 8:31 AM Geoffrey Cheng <Geoffrey Cheng <mailto:geoffrey%20cheng%20%3cgeoffrey.ch...@gmail.com%3e> > wrote: we have the same issue. usually when multiple ppl using it, only header loads. we tried couldn't find solution, so we restart every single time. in fact , we have to restart daily at least. On Oct 12, 2017 2:40 AM, "Fabian Böhnlein" <fabian.boehnl...@gmail.com<mailto:fabian.boehnl...@gmail.com>> wrote: Hi Paul, Ben, we also see this happen regularly. It's more likely to happen when a handful of people are using it. We mostly run one spark interpreter per person. We also don't observe anything in the logs. The 'header' that you mentioned is actually still in the cache. Sometimes it's specific notes that don't load. Sometimes there's a hanging Spark interpreter, once it's killed notes load again. We're pretty clueless about it. Any front-end related logs we could enable to find out more? On Sat, 19 Aug 2017 at 20:19 Ben Vogan <b...@shopkick.com<mailto:b...@shopkick.com>> wrote: I have seen Zeppelin get into this state once. I restarted it without investigating the logs however so I don't have anything useful to go on as to why. --Ben On Sat, Aug 19, 2017 at 8:17 AM, Paul Brenner <pbren...@placeiq.com<mailto:pbren...@placeiq.com>> wrote: You were correct. We had "export ZEPPELIN_SSL_PORT=false” in our zeppelin-env.sh. I’m going to comment that out. I suspect it is actually unrelated to the behavior we are seeing where pages stop loading though. Anyone else see this happen? I’ll report back if that happens again after the fix. <http://www.placeiq.com/> Paul Brenner<http://www.placeiq.com/> <http://www.placeiq.com/> <http://www.placeiq.com/> <http://www.placeiq.com/> DATA SCIENTIST<http://www.placeiq.com/> (217) 390-3033 <http://www.placeiq.com/> <http://www.placeiq.com/> <http://www.placeiq.com/> On Fri, Aug 18, 2017 at 6:37 PM moon soo Lee <moon soo Lee > wrote:<http://www.placeiq.com/> Hi,<http://www.placeiq.com/> <http://www.placeiq.com/> One of configuration value in your conf/zeppelin-env.sh or conf/zeppelin-site.xml seems "false" which expected to be to a number.<http://www.placeiq.com/> <http://www.placeiq.com/> Do you have any environment variable or property set to "false" for the configurations below?<http://www.placeiq.com/> <http://www.placeiq.com/> ZEPPELIN_PORT, zeppelin.server.port<http://www.placeiq.com/> ZEPPELIN_SSL_PORT, zeppelin.server.ssl.port<http://www.placeiq.com/> ZEPPELIN_INTERPRETER_CONNECT_TIMEOUT, zeppelin.interpreter.connect.timeout<http://www.placeiq.com/> ZEPPELIN_INTERPRETER_MAX_POOL_SIZE, zeppelin.interpreter.max.poolsize<http://www.placeiq.com/> ZEPPELIN_INTERPRETER_OUTPUT_LIMIT, zeppelin.interpreter.output.limit<http://www.placeiq.com/> <http://www.placeiq.com/> Thanks, moon<http://www.placeiq.com/> <http://www.placeiq.com/> On Fri, Aug 18, 2017 at 2:30 PM Paul Brenner <pbren...@placeiq.com> wrote:<http://www.placeiq.com/> <http://www.placeiq.com/> We have a team of 5 users who all use the same zeppelin server. Lately a few times we have run into a case where zeppelin notes stop responding and then when we try refreshing the webpage for the note all that loads is the zeppelin header with no note. When I look at the logs I see:<http://www.placeiq.com/> INFO [2017-08-18 21:23:06,569] ({qtp1286783232-14114} NotebookServer.java[sendNote]:705) - New operation from 10.201.12.26 : 55178 : nshah : GET_NOTE : 2CR2ANDEX<http://www.placeiq.com/> INFO [2017-08-18 21:24:05,740] ({qtp1286783232-14115} NotebookServer.java[onClose]:363) - Closed connection to 10.201.12.22 : 57366. (1001) Idle Timeout<http://www.placeiq.com/> INFO [2017-08-18 21:24:08,084] ({qtp1286783232-14121} NotebookServer.java[onClose]:363) - Closed connection to 10.201.12.22 : 57461. (1001) Idle Timeout<http://www.placeiq.com/> INFO [2017-08-18 21:25:10,133] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set. Authorization cache cannot be obtained.<http://www.placeiq.com/> INFO [2017-08-18 21:25:10,157] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set. Authorization cache cannot be obtained.<http://www.placeiq.com/> INFO [2017-08-18 21:25:10,172] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set. Authorization cache cannot be obtained.<http://www.placeiq.com/> WARN [2017-08-18 21:25:10,192] ({qtp1286783232-14122} SecurityRestApi.java[ticket]:87) - {"status":"OK","message":"","body":{"principal":"pbrenner","ticket":"5f717a1a-46df-4178-bc9a-690f66064d0e","roles":"[]"}}<http://www.placeiq.com/> INFO [2017-08-18 21:25:10,252] ({qtp1286783232-14123} NotebookServer.java[onOpen]:156) - New connection from 10.201.12.48 : 62413<http://www.placeiq.com/> ERROR [2017-08-18 21:25:10,353] ({qtp1286783232-14122} NotebookServer.java[onMessage]:357) - Can't handle message<http://www.placeiq.com/> java.lang.NumberFormatException: For input string: "false"<http://www.placeiq.com/> at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)<http://www.placeiq.com/> at java.lang.Integer.parseInt(Integer.java:580)<http://www.placeiq.com/> at java.lang.Integer.parseInt(Integer.java:615)<http://www.placeiq.com/> at org.apache.zeppelin.conf.ZeppelinConfiguration.getInt(ZeppelinConfiguration.java:213)<http://www.placeiq.com/> at org.apache.zeppelin.conf.ZeppelinConfiguration.getInt(ZeppelinConfiguration.java:208)<http://www.placeiq.com/> at org.apache.zeppelin.conf.ZeppelinConfiguration.dumpConfigurations(ZeppelinConfiguration.java:496)<http://www.placeiq.com/> at org.apache.zeppelin.socket.NotebookServer.sendAllConfigurations(NotebookServer.java:1680)<http://www.placeiq.com/> at org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:318)<http://www.placeiq.com/> at org.apache.zeppelin.socket.NotebookSocket.onWebSocketText(NotebookSocket.java:59)<http://www.placeiq.com/> at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:128)<http://www.placeiq.com/> at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69)<http://www.placeiq.com/> at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:65)<http://www.placeiq.com/> at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:122)<http://www.placeiq.com/> at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:161)<http://www.placeiq.com/> at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:309)<http://www.placeiq.com/> at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:214)<http://www.placeiq.com/> at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:220)<http://www.placeiq.com/> at org.eclipse.jetty.websocket.common.Parser.parse(Parser.java:258)<http://www.placeiq.com/> at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.readParse(AbstractWebSocketConnection.java:632)<http://www.placeiq.com/> at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:480)<http://www.placeiq.com/> at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)<http://www.placeiq.com/> at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)<http://www.placeiq.com/> at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)<http://www.placeiq.com/> at java.lang.Thread.run(Thread.java:745)<http://www.placeiq.com/> <http://www.placeiq.com/> Anyone have any idea what is going on or how we could trouble shoot?<http://www.placeiq.com/> <http://www.placeiq.com/> <http://www.placeiq.com/> Paul Brenner<http://www.placeiq.com/> <http://www.placeiq.com/> <http://www.placeiq.com/> <http://www.placeiq.com/> DATA SCIENTIST<http://www.placeiq.com/> (217) 390-3033 <http://www.placeiq.com/> <http://www.placeiq.com/> <http://www.placeiq.com/> <http://www.placeiq.com/> <http://www.placeiq.com/> -- <http://www.placeiq.com/> BENJAMIN VOGAN | Data Platform Team Lead<http://www.placeiq.com/> <http://www.placeiq.com/> [https://docs.google.com/uc?export=download&id=0B2SW57jgJhgaOGNLSXhCaUNoMzQ&revid=0B2SW57jgJhgaNm1lM1Q4NXlrN1VkcmQ5MG1TNWs1L0pOdkZBPQ]<http://www.placeiq.com/> [https://docs.google.com/uc?export=download&id=0B2SW57jgJhgaU0o1Zy1YWmhNN0k&revid=0B2SW57jgJhgaMG1hZXJ0b0hQYzNvVnhyRlhFV3R0aDh6eEhVPQ][https://docs.google.com/uc?export=download&id=0B2SW57jgJhgaNjNwelg2Y1lwdkE&revid=0B2SW57jgJhgaR0xZWG5xajFGQk5VRWtTNC9yV09lbmVQeDRRPQ][https://docs.google.com/uc?export=download&id=0B2SW57jgJhgaTU1MQjdpaG96N0E&revid=0B2SW57jgJhgaazkyTkU1VlF5UFBrYmFuZklhMFA1T3NqZklvPQ][https://docs.google.com/uc?export=download&id=0B2SW57jgJhgaQVJzbkc2TUVqOEU&revid=0B2SW57jgJhgaU2Jpb3hWQlZoVWFsQjl0aHBrVVZpMXp0VFYwPQ][https://docs.google.com/uc?export=download&id=0B2SW57jgJhgaMHBINExPa1NLTVE&revid=0B2SW57jgJhgacUlnaGRqNVJDNlhCTVlodHloeUR5aXNoZlp3PQ]<http://www.placeiq.com/> <http://www.placeiq.com/>