Neil Dong created ZEPPELIN-1871:
-----------------------------------

             Summary: Spark Interpreter died unexpectedlly
                 Key: ZEPPELIN-1871
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1871
             Project: Zeppelin
          Issue Type: Bug
          Components: Interpreters, zeppelin-server
    Affects Versions: 0.6.2
            Reporter: Neil Dong
            Priority: Minor




About the "spark interpreter group",  When multiple notebook use 
timed-scheduling  sharing one spark interpreter instance  , because of the 
interpreter use REPL to process the notebook ,and REPL only   allow one task in 
the same time . 
 So the code looks like this :
```
public InterpreterResult interpret(String[] lines, InterpreterContext context) {
    synchronized (this) {
      z.setGui(context.getGui());
      sc.setJobGroup(getJobGroup(context), "Zeppelin", false);
      InterpreterResult r = interpretInput(lines, context);
      sc.clearJobGroup();
      return r;
    }
  }
```


Then the blocked processing will effect each other , eventually lead the 
interpreter jvm died in somehow without any suspicious output in the 
interpreter log.
Meanwhile the zeppelin server is constantly checking the status of the 
interpreter by calling `RemoteInterpreterService#getStatus().` Because the 
interpreter is already died , the zeppelin server always found  :
```
ERROR [2016-12-26 15:39:00,715] ({pool-1-thread-84} 
RemoteScheduler.java[getStatus]:255) - Can't get status information
org.apache.zeppelin.interpreter.InterpreterException: 
org.apache.thrift.transport.TTransportException: java.net.ConnectException: 
Connection refused
        at 
org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:53)
        at 
org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37)
        at 
org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
        at 
org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
        at 
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
        at 
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
        at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:189)
        at 
org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller.getStatus(RemoteScheduler.java:253)
        at 
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:341)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException: 
java.net.ConnectException: Connection refused
        at org.apache.thrift.transport.TSocket.open(TSocket.java:187)
        at 
org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)
        ... 15 more
Caused by: java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.thrift.transport.TSocket.open(TSocket.java:182)
        ... 16 more
ERROR [2016-12-26 15:39:00,715] ({pool-1-thread-84} 
NotebookServer.java[afterStatusChange]:1145) - Error
org.apache.zeppelin.interpreter.InterpreterException: 
org.apache.zeppelin.interpreter.InterpreterException: 
org.apache.thrift.transport.TTransportException: java.net.ConnectException: 
Connection refused
        at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:250)
        at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:94)
        at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:279)
        at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
        at 
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:328)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.zeppelin.interpreter.InterpreterException: 
org.apache.thrift.transport.TTransportException: java.net.ConnectException: 
Connection refused
        at 
org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:53)
        at 
org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37)
        at 
org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
        at 
org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
        at 
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
        at 
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
        at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:189)
        at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:248)
        ... 11 more
Caused by: org.apache.thrift.transport.TTransportException: 
java.net.ConnectException: Connection refused
        at org.apache.thrift.transport.TSocket.open(TSocket.java:187)
        at 
org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)
        ... 18 more
Caused by: java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.thrift.transport.TSocket.open(TSocket.java:182)
        ... 19 more
```

There is a Jira ticket on it , but it seems they found different cause.
[https://issues.apache.org/jira/browse/ZEPPELIN-1700](url)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to