Neil Dong created ZEPPELIN-1871:
-----------------------------------
Summary: Spark Interpreter died unexpectedlly
Key: ZEPPELIN-1871
URL: https://issues.apache.org/jira/browse/ZEPPELIN-1871
Project: Zeppelin
Issue Type: Bug
Components: Interpreters, zeppelin-server
Affects Versions: 0.6.2
Reporter: Neil Dong
Priority: Minor
About the "spark interpreter group", When multiple notebook use
timed-scheduling sharing one spark interpreter instance , because of the
interpreter use REPL to process the notebook ,and REPL only allow one task in
the same time .
So the code looks like this :
```
public InterpreterResult interpret(String[] lines, InterpreterContext context) {
synchronized (this) {
z.setGui(context.getGui());
sc.setJobGroup(getJobGroup(context), "Zeppelin", false);
InterpreterResult r = interpretInput(lines, context);
sc.clearJobGroup();
return r;
}
}
```
Then the blocked processing will effect each other , eventually lead the
interpreter jvm died in somehow without any suspicious output in the
interpreter log.
Meanwhile the zeppelin server is constantly checking the status of the
interpreter by calling `RemoteInterpreterService#getStatus().` Because the
interpreter is already died , the zeppelin server always found :
```
ERROR [2016-12-26 15:39:00,715] ({pool-1-thread-84}
RemoteScheduler.java[getStatus]:255) - Can't get status information
org.apache.zeppelin.interpreter.InterpreterException:
org.apache.thrift.transport.TTransportException: java.net.ConnectException:
Connection refused
at
org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:53)
at
org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37)
at
org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
at
org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
at
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
at
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:189)
at
org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller.getStatus(RemoteScheduler.java:253)
at
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:341)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Connection refused
at org.apache.thrift.transport.TSocket.open(TSocket.java:187)
at
org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)
... 15 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.thrift.transport.TSocket.open(TSocket.java:182)
... 16 more
ERROR [2016-12-26 15:39:00,715] ({pool-1-thread-84}
NotebookServer.java[afterStatusChange]:1145) - Error
org.apache.zeppelin.interpreter.InterpreterException:
org.apache.zeppelin.interpreter.InterpreterException:
org.apache.thrift.transport.TTransportException: java.net.ConnectException:
Connection refused
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:250)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:94)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:279)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:328)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.zeppelin.interpreter.InterpreterException:
org.apache.thrift.transport.TTransportException: java.net.ConnectException:
Connection refused
at
org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:53)
at
org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37)
at
org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
at
org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
at
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
at
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:189)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:248)
... 11 more
Caused by: org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Connection refused
at org.apache.thrift.transport.TSocket.open(TSocket.java:187)
at
org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)
... 18 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.thrift.transport.TSocket.open(TSocket.java:182)
... 19 more
```
There is a Jira ticket on it , but it seems they found different cause.
[https://issues.apache.org/jira/browse/ZEPPELIN-1700](url)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)