Thanks for explanation. Zeppelin server daemon is creating a remote process and wait's for interpreter process port being available for 5 seconds. So, there is possibility that if your interpreter process is not created and listening port in 5 seconds, It would have connection refused error.
https://github.com/apache/incubator-zeppelin/blob/master/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterProcess.java#L116 This is related source code. I think you can try increase the number from 5*1000 to something bigger, and see how it works. Thanks, moon On Sat, Jun 20, 2015 at 7:37 AM John Omernik <j...@omernik.com> wrote: > Thanks for the email Moon, I have gone through some pretty logical > troubleshooting steps, but I can't seem to get this bug to occur > consistently. Like I said, this is an interesting setup in that sometimes > things work normally sometimes they don't > > When they don't start, and I check the interpreter logs, they say they are > starting fine, say on port xyz, when I check xyz (this is all after the > error) in netstat, I see it listening properly, and I even see a connection > from localhost to it, but in the interface, I can't run any more paragraphs > with that interpreter. Even if I refresh the whole page. > > One thought I had, and maybe you could help me on this... what is the > process/time out to connect to a new interpreter? I.e. > > Step 1: Paragraph with interpreter that is not running is executed, > Zeppelin sees it not running and it kicks off the new JVM with the > interpreter > Step 2: Interpreter starts > Step 3: Zeppelin connects to the Interpreter > > I guess what is the process to go from Step 2 to Step3? Is there a delay > in connection? Is there a retry? I.e. If the interpreter is starting, and > lets set Zeppelin take 2 seconds after it starts the interpreter and tries > to connect. If the interpreter isn't quite ready does it throw an error? > Does it retry? Does it wait until the interpreter is 100% started before > trying to connect? Is there a retry? > > Given the inconsistency, I was thinking timing may be an issue. These are > servers that have quite a bit going on them, thus perhaps my interpreter > starting is taking longer than Zeppelin would expect? > > > > On Fri, Jun 19, 2015 at 12:49 PM, moon soo Lee <m...@apache.org> wrote: > >> Hi, >> >> Thanks for sharing the problem. >> >> Zeppelin runs each interpreter instance as a separate JVM process and >> communicate through thrift. Little detail is, Zeppelin server daemon invoke >> interpreter JVM process with specific port and server daemon connect to >> that port. Your error is that Zeppelin server can not connect to the >> interpreter JVM process. Do you see any possibility that this process can >> cause problem on your system? >> >> About the same variable name in markdown and hive interpreter, it won't >> be a problem. >> >> Thanks, >> moon >> >> >> >> On Fri, Jun 19, 2015 at 9:34 AM John Omernik <j...@omernik.com> wrote: >> >>> Another thing that may or may not be related is on the server running >>> Zeppelin, I have multiple interfaces, it "appears" the interpreter binds on >>> all interfaces, but what about the connection? Does that come from a >>> specific interface? Could that be causing the connection refused? (I have >>> two eth interfaces and a docker0 interface on this node) >>> >>> John >>> >>> >>> On Fri, Jun 19, 2015 at 8:02 AM, John Omernik <j...@omernik.com> wrote: >>> >>>> I am not an expert in Java, but could there be an issue using the >>>> markdown and the hive interpreters together because they share a variable >>>> name (md = markdown object in %markdown and md = metatdata in %hive) >>>> >>>> >>>> >>>> markdown: >>>> >>>> public void open() { md = new Markdown4jProcessor(); } >>>> >>>> hive: >>>> >>>> try { ResultSetMetaData md = res.getMetaData(); for (int i = 1; i < >>>> md.getColumnCount() >>>> + 1; i++) { if (i == 1) { msg.append(md.getColumnName(i)); } else { msg >>>> .append("\t" + md.getColumnName(i)); } } >>>> >>>> On Fri, Jun 19, 2015 at 6:56 AM, John Omernik <j...@omernik.com> wrote: >>>> >>>>> Hey all, >>>>> >>>>> I am working with three primary interpreters, %md, %pyspark, and >>>>> %hive. What I am noticing is with my current config, sometimes an >>>>> interpreter will start other times, I'll get an errors below. I wish I >>>>> could say what the rhyme or reason was. >>>>> >>>>> If I get the errors, then I have to restart Zeppelin before it will >>>>> work (or even attempt to work). I've tried clicking "restart interpreter" >>>>> in the interpreters tab, it seems to work, but when I go back to a >>>>> notebook >>>>> I get "Scheduler already terminated" >>>>> >>>>> What's interesting here, is other than a restart, I can run the cells >>>>> (I have three one for each interpreter) in different orders and get >>>>> different results, sometimes if I run %hive first, it works, then >>>>> %pyspark, >>>>> that will work too then %md will fail. (Note these are the SAME commands, >>>>> on the same server, same config etc). >>>>> >>>>> Other times, I can get them to run no matter what, it's very >>>>> inconsistent, and combined with the fact that once an interpreter fails, >>>>> there is no getting it back until the whole server is restarted. >>>>> >>>>> Also of note here: I am running a recently compiled version of this (I >>>>> downloaded this on Wed) using git clone) >>>>> >>>>> Any help would be appreciated in determining how to troubleshoot this! >>>>> >>>>> John >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Example from %md >>>>> >>>>> *In Notebook error* >>>>> >>>>> >>>>> >>>>> %md >>>>> #For the Love of Jeezy Pete >>>>> >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:135) >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:249) >>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:104) >>>>> org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:202) >>>>> org.apache.zeppelin.scheduler.Job.run(Job.java:170) >>>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:296) >>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>>>> java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) >>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> java.lang.Thread.run(Thread.java:745) >>>>> >>>>> *In Running Shell Window (where I ran bin/zeppelin.sh)* >>>>> >>>>> org.apache.zeppelin.interpreter.InterpreterException: >>>>> org.apache.zeppelin.interpreter.InterpreterException: >>>>> org.apache.thrift.transport.TTransportException: >>>>> java.net.ConnectException: >>>>> Connection refused >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:135) >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:249) >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:104) >>>>> >>>>> at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:202) >>>>> >>>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170) >>>>> >>>>> at >>>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:296) >>>>> >>>>> at >>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>>>> >>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>>> >>>>> at >>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) >>>>> >>>>> at >>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) >>>>> >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> >>>>> Caused by: org.apache.zeppelin.interpreter.InterpreterException: >>>>> org.apache.thrift.transport.TTransportException: >>>>> java.net.ConnectException: >>>>> Connection refused >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:53) >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37) >>>>> >>>>> at >>>>> org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60) >>>>> >>>>> at >>>>> org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861) >>>>> >>>>> at >>>>> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435) >>>>> >>>>> at >>>>> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363) >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:138) >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:133) >>>>> >>>>> ... 12 more >>>>> >>>>> Caused by: org.apache.thrift.transport.TTransportException: >>>>> java.net.ConnectException: Connection refused >>>>> >>>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:185) >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51) >>>>> >>>>> ... 19 more >>>>> >>>>> Caused by: java.net.ConnectException: Connection refused >>>>> >>>>> at java.net.PlainSocketImpl.socketConnect(Native Method) >>>>> >>>>> at >>>>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) >>>>> >>>>> at >>>>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) >>>>> >>>>> at >>>>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) >>>>> >>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) >>>>> >>>>> at java.net.Socket.connect(Socket.java:579) >>>>> >>>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:180) >>>>> >>>>> ... 20 more >>>>> >>>>> *from interpreter log file:* >>>>> >>>>> INFO [2015-06-19 06:44:29,134] ({Thread-0} >>>>> RemoteInterpreterServer.java[run]:95) - Starting remote interpreter server >>>>> on port 54930 >>>>> >>>>> >>>>> *From Zeppelin Log file:* >>>>> >>>>> INFO [2015-06-19 06:44:19,329] ({pool-1-thread-2} >>>>> SchedulerFactory.java[jobStarted]:132) - Job >>>>> paragraph_1434713440246_1991176208 started by scheduler >>>>> remoteinterpreter_328619575 >>>>> >>>>> INFO [2015-06-19 06:44:19,331] ({pool-1-thread-2} >>>>> Paragraph.java[jobRun]:194) - run paragraph 20150619-063040_649381067 >>>>> using >>>>> md org.apache.zeppelin.interpreter.LazyOpenInterpreter@38946f29 >>>>> >>>>> INFO [2015-06-19 06:44:19,341] ({pool-1-thread-2} >>>>> RemoteInterpreterProcess.java[reference]:107) - Run interpreter process >>>>> /mapr/brewpot/mesos/zeppelin/0.5.0-incubating-SNAPSHOT/bin/interpreter.sh >>>>> -d /mapr/brewpot/mesos/zeppelin/0.5.0-incubating-SNAPSHOT/interpreter/md >>>>> -p >>>>> 54930 >>>>> >>>>> ERROR [2015-06-19 06:44:24,399] ({Thread-35} >>>>> RemoteScheduler.java[getStatus]:226) - Can't get status information >>>>> >>>>> org.apache.zeppelin.interpreter.InterpreterException: >>>>> org.apache.thrift.transport.TTransportException: >>>>> java.net.ConnectException: >>>>> Connection refused >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:53) >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37) >>>>> >>>>> at >>>>> org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60) >>>>> >>>>> at >>>>> org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861) >>>>> >>>>> at >>>>> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435) >>>>> >>>>> at >>>>> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363) >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:138) >>>>> >>>>> at >>>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller.getStatus(RemoteScheduler.java:224) >>>>> >>>>> at >>>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller.run(RemoteScheduler.java:183) >>>>> >>>>> Caused by: org.apache.thrift.transport.TTransportException: >>>>> java.net.ConnectException: Connection refused >>>>> >>>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:185) >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51) >>>>> >>>>> ... 8 more >>>>> >>>>> Caused by: java.net.ConnectException: Connection refused >>>>> >>>>> at java.net.PlainSocketImpl.socketConnect(Native Method) >>>>> >>>>> at >>>>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) >>>>> >>>>> at >>>>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) >>>>> >>>>> at >>>>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) >>>>> >>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) >>>>> >>>>> at java.net.Socket.connect(Socket.java:579) >>>>> >>>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:180) >>>>> >>>>> ... 9 more >>>>> >>>>> ERROR [2015-06-19 06:44:24,399] ({pool-1-thread-2} Job.java[run]:183) >>>>> - Job failed >>>>> >>>>> org.apache.zeppelin.interpreter.InterpreterException: >>>>> org.apache.zeppelin.interpreter.InterpreterException: >>>>> org.apache.thrift.transport.TTransportException: >>>>> java.net.ConnectException: >>>>> Connection refused >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:135) >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:249) >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:104) >>>>> >>>>> at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:202) >>>>> >>>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170) >>>>> >>>>> at >>>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:296) >>>>> >>>>> at >>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>>>> >>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>>> >>>>> at >>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) >>>>> >>>>> at >>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) >>>>> >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> >>>>> Caused by: org.apache.zeppelin.interpreter.InterpreterException: >>>>> org.apache.thrift.transport.TTransportException: >>>>> java.net.ConnectException: >>>>> Connection refused >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:53) >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37) >>>>> >>>>> at >>>>> org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60) >>>>> >>>>> at >>>>> org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861) >>>>> >>>>> at >>>>> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435) >>>>> >>>>> at >>>>> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363) >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:138) >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:133) >>>>> >>>>> ... 12 more >>>>> >>>>> Caused by: org.apache.thrift.transport.TTransportException: >>>>> java.net.ConnectException: Connection refused >>>>> >>>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:185) >>>>> >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51) >>>>> >>>>> ... 19 more >>>>> >>>>> Caused by: java.net.ConnectException: Connection refused >>>>> >>>>> at java.net.PlainSocketImpl.socketConnect(Native Method) >>>>> >>>>> at >>>>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) >>>>> >>>>> at >>>>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) >>>>> >>>>> at >>>>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) >>>>> >>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) >>>>> >>>>> at java.net.Socket.connect(Socket.java:579) >>>>> >>>>> at org.apache.thrift.transport.TSocket.open(TSocket.java:180) >>>>> >>>>> ... 20 more >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >