Thanks for the email Moon, I have gone through some pretty logical
troubleshooting steps, but I can't seem to get this bug to occur
consistently. Like I said, this is an interesting setup in that sometimes
things work normally sometimes they don't

When they don't start, and I check the interpreter logs, they say they are
starting fine, say on port xyz, when I check xyz (this is all after the
error) in netstat, I see it listening properly, and I even see a connection
from localhost to it, but in the interface, I can't run any more paragraphs
with that interpreter.  Even if I refresh the whole page.

One thought I had, and maybe you could help me on this... what is the
process/time out to connect to a new interpreter?  I.e.

Step 1:  Paragraph with interpreter that is not running is executed,
Zeppelin sees it not running and it kicks off the new JVM with the
interpreter
Step 2: Interpreter starts
Step 3: Zeppelin connects to the Interpreter

I guess what is the process to go from Step 2 to Step3? Is there a delay in
connection? Is there a retry? I.e. If the interpreter is starting, and lets
set Zeppelin take 2 seconds after it starts the interpreter and tries to
connect.  If the interpreter isn't quite ready does it throw an error? Does
it retry?  Does it wait until the interpreter is 100% started before trying
to connect? Is there a retry?

Given the inconsistency, I was thinking timing may be an issue.  These are
servers that have quite a bit going on them, thus perhaps my interpreter
starting is taking longer than Zeppelin would expect?



On Fri, Jun 19, 2015 at 12:49 PM, moon soo Lee <m...@apache.org> wrote:

> Hi,
>
> Thanks for sharing the problem.
>
> Zeppelin runs each interpreter instance as a separate JVM process and
> communicate through thrift. Little detail is, Zeppelin server daemon invoke
> interpreter JVM process with specific port and server daemon connect to
> that port. Your error is that Zeppelin server can not connect to the
> interpreter JVM process. Do you see any possibility that this process can
> cause problem on your system?
>
> About the same variable name in markdown and hive interpreter, it won't be
> a problem.
>
> Thanks,
> moon
>
>
>
> On Fri, Jun 19, 2015 at 9:34 AM John Omernik <j...@omernik.com> wrote:
>
>> Another thing that may or may not be related is on the server running
>> Zeppelin, I have multiple interfaces, it "appears" the interpreter binds on
>> all interfaces, but what about the connection? Does that come from a
>> specific interface? Could that be causing the connection refused? (I have
>> two eth interfaces and a docker0 interface on this node)
>>
>> John
>>
>>
>> On Fri, Jun 19, 2015 at 8:02 AM, John Omernik <j...@omernik.com> wrote:
>>
>>> I am not an expert in Java, but could there be an issue using the
>>> markdown and the hive interpreters together because they share a variable
>>> name (md = markdown object in %markdown and md = metatdata in %hive)
>>>
>>>
>>>
>>> markdown:
>>>
>>> public void open() { md = new Markdown4jProcessor(); }
>>>
>>> hive:
>>>
>>> try { ResultSetMetaData md = res.getMetaData(); for (int i = 1; i < 
>>> md.getColumnCount()
>>> + 1; i++) { if (i == 1) { msg.append(md.getColumnName(i)); } else { msg.
>>> append("\t" + md.getColumnName(i)); } }
>>>
>>> On Fri, Jun 19, 2015 at 6:56 AM, John Omernik <j...@omernik.com> wrote:
>>>
>>>> Hey all,
>>>>
>>>> I am working with three primary interpreters, %md, %pyspark, and
>>>> %hive.  What I am noticing is with my current config, sometimes an
>>>> interpreter will start other times, I'll get an errors below. I wish I
>>>> could say what the rhyme or reason was.
>>>>
>>>> If I get the errors, then I have to restart Zeppelin before it will
>>>> work (or even attempt to work). I've tried clicking "restart interpreter"
>>>> in the interpreters tab, it seems to work, but when I go back to a notebook
>>>> I get "Scheduler already terminated"
>>>>
>>>> What's interesting here, is other than a restart, I can run the cells
>>>> (I have three one for each interpreter) in different orders and get
>>>> different results, sometimes if I run %hive first, it works, then %pyspark,
>>>> that will work too then %md will fail. (Note these are the SAME commands,
>>>> on the same server, same config etc).
>>>>
>>>> Other times, I can get them to run no matter what, it's very
>>>> inconsistent, and combined with the fact that once an interpreter fails,
>>>> there is no getting it back until the whole server is restarted.
>>>>
>>>> Also of note here: I am running a recently compiled version of this (I
>>>> downloaded this on Wed) using git clone)
>>>>
>>>> Any help would be appreciated in determining how to troubleshoot this!
>>>>
>>>> John
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Example from %md
>>>>
>>>> *In Notebook error*
>>>>
>>>>
>>>>
>>>> %md
>>>> #For the Love of Jeezy Pete
>>>>
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:135)
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:249)
>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:104)
>>>> org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:202)
>>>> org.apache.zeppelin.scheduler.Job.run(Job.java:170)
>>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:296)
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>> java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> java.lang.Thread.run(Thread.java:745)
>>>>
>>>> *In Running Shell Window (where I ran bin/zeppelin.sh)*
>>>>
>>>> org.apache.zeppelin.interpreter.InterpreterException:
>>>> org.apache.zeppelin.interpreter.InterpreterException:
>>>> org.apache.thrift.transport.TTransportException: java.net.ConnectException:
>>>> Connection refused
>>>>
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:135)
>>>>
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:249)
>>>>
>>>> at
>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:104)
>>>>
>>>> at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:202)
>>>>
>>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
>>>>
>>>> at
>>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:296)
>>>>
>>>> at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>
>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>
>>>> at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>>>
>>>> at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>>>
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>
>>>> at java.lang.Thread.run(Thread.java:745)
>>>>
>>>> Caused by: org.apache.zeppelin.interpreter.InterpreterException:
>>>> org.apache.thrift.transport.TTransportException: java.net.ConnectException:
>>>> Connection refused
>>>>
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:53)
>>>>
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37)
>>>>
>>>> at
>>>> org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
>>>>
>>>> at
>>>> org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
>>>>
>>>> at
>>>> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
>>>>
>>>> at
>>>> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
>>>>
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:138)
>>>>
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:133)
>>>>
>>>> ... 12 more
>>>>
>>>> Caused by: org.apache.thrift.transport.TTransportException:
>>>> java.net.ConnectException: Connection refused
>>>>
>>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
>>>>
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)
>>>>
>>>> ... 19 more
>>>>
>>>> Caused by: java.net.ConnectException: Connection refused
>>>>
>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>
>>>> at
>>>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>>>>
>>>> at
>>>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>>>>
>>>> at
>>>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>>>>
>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>
>>>> at java.net.Socket.connect(Socket.java:579)
>>>>
>>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
>>>>
>>>> ... 20 more
>>>>
>>>> *from interpreter log file:*
>>>>
>>>>  INFO [2015-06-19 06:44:29,134] ({Thread-0}
>>>> RemoteInterpreterServer.java[run]:95) - Starting remote interpreter server
>>>> on port 54930
>>>>
>>>>
>>>> *From Zeppelin Log file:*
>>>>
>>>>  INFO [2015-06-19 06:44:19,329] ({pool-1-thread-2}
>>>> SchedulerFactory.java[jobStarted]:132) - Job
>>>> paragraph_1434713440246_1991176208 started by scheduler
>>>> remoteinterpreter_328619575
>>>>
>>>>  INFO [2015-06-19 06:44:19,331] ({pool-1-thread-2}
>>>> Paragraph.java[jobRun]:194) - run paragraph 20150619-063040_649381067 using
>>>> md org.apache.zeppelin.interpreter.LazyOpenInterpreter@38946f29
>>>>
>>>>  INFO [2015-06-19 06:44:19,341] ({pool-1-thread-2}
>>>> RemoteInterpreterProcess.java[reference]:107) - Run interpreter process
>>>> /mapr/brewpot/mesos/zeppelin/0.5.0-incubating-SNAPSHOT/bin/interpreter.sh
>>>> -d /mapr/brewpot/mesos/zeppelin/0.5.0-incubating-SNAPSHOT/interpreter/md -p
>>>> 54930
>>>>
>>>> ERROR [2015-06-19 06:44:24,399] ({Thread-35}
>>>> RemoteScheduler.java[getStatus]:226) - Can't get status information
>>>>
>>>> org.apache.zeppelin.interpreter.InterpreterException:
>>>> org.apache.thrift.transport.TTransportException: java.net.ConnectException:
>>>> Connection refused
>>>>
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:53)
>>>>
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37)
>>>>
>>>> at
>>>> org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
>>>>
>>>> at
>>>> org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
>>>>
>>>> at
>>>> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
>>>>
>>>> at
>>>> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
>>>>
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:138)
>>>>
>>>> at
>>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller.getStatus(RemoteScheduler.java:224)
>>>>
>>>> at
>>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller.run(RemoteScheduler.java:183)
>>>>
>>>> Caused by: org.apache.thrift.transport.TTransportException:
>>>> java.net.ConnectException: Connection refused
>>>>
>>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
>>>>
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)
>>>>
>>>> ... 8 more
>>>>
>>>> Caused by: java.net.ConnectException: Connection refused
>>>>
>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>
>>>> at
>>>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>>>>
>>>> at
>>>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>>>>
>>>> at
>>>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>>>>
>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>
>>>> at java.net.Socket.connect(Socket.java:579)
>>>>
>>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
>>>>
>>>> ... 9 more
>>>>
>>>> ERROR [2015-06-19 06:44:24,399] ({pool-1-thread-2} Job.java[run]:183) -
>>>> Job failed
>>>>
>>>> org.apache.zeppelin.interpreter.InterpreterException:
>>>> org.apache.zeppelin.interpreter.InterpreterException:
>>>> org.apache.thrift.transport.TTransportException: java.net.ConnectException:
>>>> Connection refused
>>>>
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:135)
>>>>
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:249)
>>>>
>>>> at
>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:104)
>>>>
>>>> at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:202)
>>>>
>>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
>>>>
>>>> at
>>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:296)
>>>>
>>>> at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>
>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>
>>>> at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>>>
>>>> at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>>>
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>
>>>> at java.lang.Thread.run(Thread.java:745)
>>>>
>>>> Caused by: org.apache.zeppelin.interpreter.InterpreterException:
>>>> org.apache.thrift.transport.TTransportException: java.net.ConnectException:
>>>> Connection refused
>>>>
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:53)
>>>>
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37)
>>>>
>>>> at
>>>> org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
>>>>
>>>> at
>>>> org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
>>>>
>>>> at
>>>> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
>>>>
>>>> at
>>>> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
>>>>
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:138)
>>>>
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:133)
>>>>
>>>> ... 12 more
>>>>
>>>> Caused by: org.apache.thrift.transport.TTransportException:
>>>> java.net.ConnectException: Connection refused
>>>>
>>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
>>>>
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)
>>>>
>>>> ... 19 more
>>>>
>>>> Caused by: java.net.ConnectException: Connection refused
>>>>
>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>
>>>> at
>>>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>>>>
>>>> at
>>>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>>>>
>>>> at
>>>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>>>>
>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>
>>>> at java.net.Socket.connect(Socket.java:579)
>>>>
>>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
>>>>
>>>>  ... 20 more
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>

Reply via email to