GitHub user HeartSaVioR opened a pull request:
https://github.com/apache/incubator-zeppelin/pull/576
ZEPPELIN-539 RemoteInterpreter Heartbeat (WIP)
### What is this PR for?
To help users to determine remote interpreter is not able to respond.
This is just WIP, and "How to help users" could be improved with
discussions.
### What type of PR is it?
Feature (Improve?)
### Todos
* [ ] - Rebase when #574 is merged to master
* since shutdowning heartbeat threads requires reference count to be zero
* without #574 some threads could be alive although remote interpreter is
closed
* [ ] - Discuss proper values for sending heartbeat interval, checking
timeout interval
* [ ] - Discuss how to let users know when remote interpreter is timed out
* [ ] - Discuss possible way to restore remote interpreter back to normal
### Is there a relevant Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-539
### How should this be tested?
1. run a spark paragraph to ensure spark remote interpreter process is run
2. kill -9 to spark remote interpreter process
3. run paragraph again (it may show broken pipe, or connection refused
after #575)
4. wait 30 secs (or remote interpreter connection timeout value) to let
RemoteInterpreterProcess classifies process to be timed out
5. run paragraph again (it shows
org.apache.zeppelin.interpreter.InterpreterProcessHeartbeatFailedException to
users)
### Screenshots (if appropriate)
### Questions:
* Does the licenses files need update? (No)
* Is there breaking changes for older versions? (No)
* Does this needs documentation? (Maybe no)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HeartSaVioR/incubator-zeppelin
ZEPPELIN-539-WIP-v1
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-zeppelin/pull/576.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #576
----
commit b5a75bd4d4bf2293469d4210c9035575029a8d85
Author: Jungtaek Lim <[email protected]>
Date: 2015-12-28T22:14:46Z
ZEPPELIN-539 RemoteInterpreter Heartbeat
* introduce "ping" function to thrift
* every remote interpreter processes will have two additional threads
* send "ping" to check that remote interpreter process is able to respond
* check last heartbeat timestamp and determine it's timed out
* introduce InterpreterProcessHeartbeatFailedException
* thrown when remote interpreter process is determined to timed out
commit d27152ee1400930401a62bba4ce96948e4ebae16
Author: Jungtaek Lim <[email protected]>
Date: 2015-12-28T22:16:52Z
ZEPPELIN-539 Add missing license header
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---