Thanks for explanation.
Zeppelin server daemon is creating a remote process and wait's for
interpreter process port being available for 5 seconds.
So, there is possibility that if your interpreter process is not created
and listening port in 5 seconds, It would have connection refused error.

https://github.com/apache/incubator-zeppelin/blob/master/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterProcess.java#L116

This is related source code. I think you can try increase the number from
5*1000 to something bigger, and see how it works.

Thanks,
moon



On Sat, Jun 20, 2015 at 7:37 AM John Omernik <j...@omernik.com> wrote:

> Thanks for the email Moon, I have gone through some pretty logical
> troubleshooting steps, but I can't seem to get this bug to occur
> consistently. Like I said, this is an interesting setup in that sometimes
> things work normally sometimes they don't
>
> When they don't start, and I check the interpreter logs, they say they are
> starting fine, say on port xyz, when I check xyz (this is all after the
> error) in netstat, I see it listening properly, and I even see a connection
> from localhost to it, but in the interface, I can't run any more paragraphs
> with that interpreter.  Even if I refresh the whole page.
>
> One thought I had, and maybe you could help me on this... what is the
> process/time out to connect to a new interpreter?  I.e.
>
> Step 1:  Paragraph with interpreter that is not running is executed,
> Zeppelin sees it not running and it kicks off the new JVM with the
> interpreter
> Step 2: Interpreter starts
> Step 3: Zeppelin connects to the Interpreter
>
> I guess what is the process to go from Step 2 to Step3? Is there a delay
> in connection? Is there a retry? I.e. If the interpreter is starting, and
> lets set Zeppelin take 2 seconds after it starts the interpreter and tries
> to connect.  If the interpreter isn't quite ready does it throw an error?
> Does it retry?  Does it wait until the interpreter is 100% started before
> trying to connect? Is there a retry?
>
> Given the inconsistency, I was thinking timing may be an issue.  These are
> servers that have quite a bit going on them, thus perhaps my interpreter
> starting is taking longer than Zeppelin would expect?
>
>
>
> On Fri, Jun 19, 2015 at 12:49 PM, moon soo Lee <m...@apache.org> wrote:
>
>> Hi,
>>
>> Thanks for sharing the problem.
>>
>> Zeppelin runs each interpreter instance as a separate JVM process and
>> communicate through thrift. Little detail is, Zeppelin server daemon invoke
>> interpreter JVM process with specific port and server daemon connect to
>> that port. Your error is that Zeppelin server can not connect to the
>> interpreter JVM process. Do you see any possibility that this process can
>> cause problem on your system?
>>
>> About the same variable name in markdown and hive interpreter, it won't
>> be a problem.
>>
>> Thanks,
>> moon
>>
>>
>>
>> On Fri, Jun 19, 2015 at 9:34 AM John Omernik <j...@omernik.com> wrote:
>>
>>> Another thing that may or may not be related is on the server running
>>> Zeppelin, I have multiple interfaces, it "appears" the interpreter binds on
>>> all interfaces, but what about the connection? Does that come from a
>>> specific interface? Could that be causing the connection refused? (I have
>>> two eth interfaces and a docker0 interface on this node)
>>>
>>> John
>>>
>>>
>>> On Fri, Jun 19, 2015 at 8:02 AM, John Omernik <j...@omernik.com> wrote:
>>>
>>>> I am not an expert in Java, but could there be an issue using the
>>>> markdown and the hive interpreters together because they share a variable
>>>> name (md = markdown object in %markdown and md = metatdata in %hive)
>>>>
>>>>
>>>>
>>>> markdown:
>>>>
>>>> public void open() { md = new Markdown4jProcessor(); }
>>>>
>>>> hive:
>>>>
>>>> try { ResultSetMetaData md = res.getMetaData(); for (int i = 1; i < 
>>>> md.getColumnCount()
>>>> + 1; i++) { if (i == 1) { msg.append(md.getColumnName(i)); } else { msg
>>>> .append("\t" + md.getColumnName(i)); } }
>>>>
>>>> On Fri, Jun 19, 2015 at 6:56 AM, John Omernik <j...@omernik.com> wrote:
>>>>
>>>>> Hey all,
>>>>>
>>>>> I am working with three primary interpreters, %md, %pyspark, and
>>>>> %hive.  What I am noticing is with my current config, sometimes an
>>>>> interpreter will start other times, I'll get an errors below. I wish I
>>>>> could say what the rhyme or reason was.
>>>>>
>>>>> If I get the errors, then I have to restart Zeppelin before it will
>>>>> work (or even attempt to work). I've tried clicking "restart interpreter"
>>>>> in the interpreters tab, it seems to work, but when I go back to a 
>>>>> notebook
>>>>> I get "Scheduler already terminated"
>>>>>
>>>>> What's interesting here, is other than a restart, I can run the cells
>>>>> (I have three one for each interpreter) in different orders and get
>>>>> different results, sometimes if I run %hive first, it works, then 
>>>>> %pyspark,
>>>>> that will work too then %md will fail. (Note these are the SAME commands,
>>>>> on the same server, same config etc).
>>>>>
>>>>> Other times, I can get them to run no matter what, it's very
>>>>> inconsistent, and combined with the fact that once an interpreter fails,
>>>>> there is no getting it back until the whole server is restarted.
>>>>>
>>>>> Also of note here: I am running a recently compiled version of this (I
>>>>> downloaded this on Wed) using git clone)
>>>>>
>>>>> Any help would be appreciated in determining how to troubleshoot this!
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Example from %md
>>>>>
>>>>> *In Notebook error*
>>>>>
>>>>>
>>>>>
>>>>> %md
>>>>> #For the Love of Jeezy Pete
>>>>>
>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:135)
>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:249)
>>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:104)
>>>>> org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:202)
>>>>> org.apache.zeppelin.scheduler.Job.run(Job.java:170)
>>>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:296)
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> java.lang.Thread.run(Thread.java:745)
>>>>>
>>>>> *In Running Shell Window (where I ran bin/zeppelin.sh)*
>>>>>
>>>>> org.apache.zeppelin.interpreter.InterpreterException:
>>>>> org.apache.zeppelin.interpreter.InterpreterException:
>>>>> org.apache.thrift.transport.TTransportException: 
>>>>> java.net.ConnectException:
>>>>> Connection refused
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:135)
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:249)
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:104)
>>>>>
>>>>> at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:202)
>>>>>
>>>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:296)
>>>>>
>>>>> at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>>
>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>
>>>>> at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>>>>
>>>>> at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>>>>
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>
>>>>> Caused by: org.apache.zeppelin.interpreter.InterpreterException:
>>>>> org.apache.thrift.transport.TTransportException: 
>>>>> java.net.ConnectException:
>>>>> Connection refused
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:53)
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37)
>>>>>
>>>>> at
>>>>> org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
>>>>>
>>>>> at
>>>>> org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
>>>>>
>>>>> at
>>>>> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
>>>>>
>>>>> at
>>>>> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:138)
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:133)
>>>>>
>>>>> ... 12 more
>>>>>
>>>>> Caused by: org.apache.thrift.transport.TTransportException:
>>>>> java.net.ConnectException: Connection refused
>>>>>
>>>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)
>>>>>
>>>>> ... 19 more
>>>>>
>>>>> Caused by: java.net.ConnectException: Connection refused
>>>>>
>>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>>
>>>>> at
>>>>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>>>>>
>>>>> at
>>>>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>>>>>
>>>>> at
>>>>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>>>>>
>>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>>
>>>>> at java.net.Socket.connect(Socket.java:579)
>>>>>
>>>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
>>>>>
>>>>> ... 20 more
>>>>>
>>>>> *from interpreter log file:*
>>>>>
>>>>>  INFO [2015-06-19 06:44:29,134] ({Thread-0}
>>>>> RemoteInterpreterServer.java[run]:95) - Starting remote interpreter server
>>>>> on port 54930
>>>>>
>>>>>
>>>>> *From Zeppelin Log file:*
>>>>>
>>>>>  INFO [2015-06-19 06:44:19,329] ({pool-1-thread-2}
>>>>> SchedulerFactory.java[jobStarted]:132) - Job
>>>>> paragraph_1434713440246_1991176208 started by scheduler
>>>>> remoteinterpreter_328619575
>>>>>
>>>>>  INFO [2015-06-19 06:44:19,331] ({pool-1-thread-2}
>>>>> Paragraph.java[jobRun]:194) - run paragraph 20150619-063040_649381067 
>>>>> using
>>>>> md org.apache.zeppelin.interpreter.LazyOpenInterpreter@38946f29
>>>>>
>>>>>  INFO [2015-06-19 06:44:19,341] ({pool-1-thread-2}
>>>>> RemoteInterpreterProcess.java[reference]:107) - Run interpreter process
>>>>> /mapr/brewpot/mesos/zeppelin/0.5.0-incubating-SNAPSHOT/bin/interpreter.sh
>>>>> -d /mapr/brewpot/mesos/zeppelin/0.5.0-incubating-SNAPSHOT/interpreter/md 
>>>>> -p
>>>>> 54930
>>>>>
>>>>> ERROR [2015-06-19 06:44:24,399] ({Thread-35}
>>>>> RemoteScheduler.java[getStatus]:226) - Can't get status information
>>>>>
>>>>> org.apache.zeppelin.interpreter.InterpreterException:
>>>>> org.apache.thrift.transport.TTransportException: 
>>>>> java.net.ConnectException:
>>>>> Connection refused
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:53)
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37)
>>>>>
>>>>> at
>>>>> org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
>>>>>
>>>>> at
>>>>> org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
>>>>>
>>>>> at
>>>>> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
>>>>>
>>>>> at
>>>>> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:138)
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller.getStatus(RemoteScheduler.java:224)
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller.run(RemoteScheduler.java:183)
>>>>>
>>>>> Caused by: org.apache.thrift.transport.TTransportException:
>>>>> java.net.ConnectException: Connection refused
>>>>>
>>>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)
>>>>>
>>>>> ... 8 more
>>>>>
>>>>> Caused by: java.net.ConnectException: Connection refused
>>>>>
>>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>>
>>>>> at
>>>>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>>>>>
>>>>> at
>>>>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>>>>>
>>>>> at
>>>>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>>>>>
>>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>>
>>>>> at java.net.Socket.connect(Socket.java:579)
>>>>>
>>>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
>>>>>
>>>>> ... 9 more
>>>>>
>>>>> ERROR [2015-06-19 06:44:24,399] ({pool-1-thread-2} Job.java[run]:183)
>>>>> - Job failed
>>>>>
>>>>> org.apache.zeppelin.interpreter.InterpreterException:
>>>>> org.apache.zeppelin.interpreter.InterpreterException:
>>>>> org.apache.thrift.transport.TTransportException: 
>>>>> java.net.ConnectException:
>>>>> Connection refused
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:135)
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:249)
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:104)
>>>>>
>>>>> at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:202)
>>>>>
>>>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:296)
>>>>>
>>>>> at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>>
>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>
>>>>> at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>>>>
>>>>> at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>>>>
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>
>>>>> Caused by: org.apache.zeppelin.interpreter.InterpreterException:
>>>>> org.apache.thrift.transport.TTransportException: 
>>>>> java.net.ConnectException:
>>>>> Connection refused
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:53)
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37)
>>>>>
>>>>> at
>>>>> org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
>>>>>
>>>>> at
>>>>> org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
>>>>>
>>>>> at
>>>>> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
>>>>>
>>>>> at
>>>>> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:138)
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:133)
>>>>>
>>>>> ... 12 more
>>>>>
>>>>> Caused by: org.apache.thrift.transport.TTransportException:
>>>>> java.net.ConnectException: Connection refused
>>>>>
>>>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
>>>>>
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)
>>>>>
>>>>> ... 19 more
>>>>>
>>>>> Caused by: java.net.ConnectException: Connection refused
>>>>>
>>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>>
>>>>> at
>>>>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>>>>>
>>>>> at
>>>>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>>>>>
>>>>> at
>>>>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>>>>>
>>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>>
>>>>> at java.net.Socket.connect(Socket.java:579)
>>>>>
>>>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
>>>>>
>>>>>  ... 20 more
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>

Reply via email to