Re: Query about the high availability of Zeppelin

Jeff Zhang Mon, 03 Jul 2017 15:35:02 -0700

Thanks for the sharing, Ruslan.  This might be useful for some users'
scenario.  And I can see 2 limitations for this approach.


* Scoped mode per user would not work, since different users would route to
different zeppelin instance. That means data can not be shared across users.
* Interpreter recovery is not possible.

If we want to implement HA, it would be better a full HA feature that could
be used in all scenario. It would require a lot of work to do, and there's
other higher priority things needs to be done first. Anyway your approach
is still useful for some users, Thanks for your sharing again.





Ruslan Dautkhanov <[email protected]>于2017年7月4日周二 上午12:41写道：

> Jeff,
>
> Here's scenario:
> - Zeppelin servers (ZS) are running on the same port on two servers (ZS1
> and ZS2)
> - Load balancer (LB) routes individual user's requests always to the same
> one server (either ZS1 or ZS2)
>   through sticky sessions (SS). Different end users may end up on
> different servers though.
>   The same one user will always go to that one ZS, chosen 1st time it made
> a connection.
> - If either ZS becomes unavailable, LB reroutes all user's connections to
> one that is alive.
>   Yes, it may mean that: (a) users will have to relogin once they failed
> over and (b) it'll be a
>   new interpreter processes span up once users re-open their notebooks.
>
> So it's possible to have this kind of Zeppelin HA without code changes in
> Zeppelin.
>
> A nice extension to that would be recovery of interpreters that you're
> talking about.
> My undertsanding it's technically possible, for example, to reconnect to
> existing
> Spark driver after failover (in deply-mode=cluster) instead of creating a
> new Spark driver .
>
> Great topic. Thank you!
>
>
>
> --
> Ruslan Dautkhanov
>
> On Fri, Jun 30, 2017 at 7:00 PM, Jeff Zhang <[email protected]> wrote:
>
>>
>> Basically Zeppelin HA require 2 major things:
>>
>> 1. Shared storage (storage for notebook, interpreter setting,
>> zeppelin-site.xml, zeppelin-env.sh, shiro.ini, credentials.json)
>> 2. Recover running interpreter.The standby zeppelin instance don't know
>> where are the running interpreter (host:port), so it can not recover the
>> running interpreters when standby zeppelin become active. Maybe we can
>> store the runtime info in zookeeper, anyway it requires more design and
>> discussion.
>>
>>
>>
>> Ruslan Dautkhanov <[email protected]>于2017年7月1日周六 上午8:07写道：
>>
>>> I think if you have a shared storage for notebooks (for example, NFS
>>> mounted from a third server),
>>> and a load-balancer that supports sticky sessions (like F5) on top, it
>>> should be possible to have HA without
>>> any code change in Zeppelin. Am I missing something?
>>>
>>>
>>>
>>> --
>>> Ruslan Dautkhanov
>>>
>>> On Fri, Jun 30, 2017 at 5:54 PM, Alexander Filipchik <
>>> [email protected]> wrote:
>>>
>>>> Honestly,  HA requires more than just active stand by.
>>>> It should be able to scale without major surgeries, which is not
>>>> possible right now. For example, if you start too many interpreters,
>>>> zeppelin box will simply run out of memory.
>>>>
>>>> Alex
>>>>
>>>> On Thu, Jun 29, 2017 at 10:59 PM, wenxing zheng <
>>>> [email protected]> wrote:
>>>>
>>>>> at first, I would think GIT storage is a good option and we can push
>>>>> and pull the changes regularly.
>>>>>
>>>>> With multiple zeppelin instances, maybe we need a new component or
>>>>> service to act as a distributed scheduler: dispatch the Job to and manage
>>>>> the Jobs on the Zeppelin instances.
>>>>>
>>>>> On Fri, Jun 30, 2017 at 1:26 PM, Vinay Shukla <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Here is what I think should be part of HA consideration:
>>>>>>
>>>>>>    1. Have multiple Zeppelin Instances
>>>>>>    2. Have the notebooks storage backed by something like an NFS so
>>>>>>    all notebooks are visible across all Zeppelin instances
>>>>>>    3. Put multiple load balancers infront of Zeppelin to route
>>>>>>    requests.
>>>>>>
>>>>>> Consider that HA needs scalability, which depends on which
>>>>>> interpreter you plan to use. So you might need to consider HA at both
>>>>>> Zeppelin and interpreter level. For example if you were using Z + Livy +
>>>>>> Spark, you will need to consider scalability + HA needs of Z + Livy
>>>>>> interpreter + Livy Server + Spark (on Cluster manager).
>>>>>>
>>>>>> On Thu, Jun 29, 2017 at 10:04 PM, wenxing zheng <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> and do we have any architecture doc for reference? Because we need
>>>>>>> to add the HA capability as soon as possible, hope we can figure it out.
>>>>>>>
>>>>>>> On Fri, Jun 30, 2017 at 12:33 PM, wenxing zheng <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Thanks to Jeff and Moon.
>>>>>>>>
>>>>>>>> So currently the active-active model doesn't work on GIT storage,
>>>>>>>> am I right?
>>>>>>>>
>>>>>>>> On Fri, Jun 30, 2017 at 12:16 PM, moon soo Lee <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Basically active-(hot)standby model would work.
>>>>>>>>> Two or more Zeppelin instance can be started and pointing the same
>>>>>>>>> notebook storage, if only one Zeppelin instance (active) change 
>>>>>>>>> notebook at
>>>>>>>>> any given time.
>>>>>>>>>
>>>>>>>>> In case of the active instance fails, one of rest instance can
>>>>>>>>> take over the role by refreshing notebook list and start make change.
>>>>>>>>>
>>>>>>>>> But all these fail over is not provided by Zeppelin and need to
>>>>>>>>> depends on external script or HA software (like Heartbeat).
>>>>>>>>>
>>>>>>>>> Like Jeff mentioned, community does not have concrete plan for
>>>>>>>>> having HA built-in at this moment.
>>>>>>>>>
>>>>>>>>> Hope this helps,
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> moon
>>>>>>>>>
>>>>>>>>> On Fri, Jun 30, 2017 at 1:01 PM Jeff Zhang <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> No concrete plan for that. There're other higher priority things
>>>>>>>>>> need to be done. At least it would not be available in 0.8, maybe 
>>>>>>>>>> after 1.0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> wenxing zheng <[email protected]>于2017年6月30日周五 上午11:47写道：
>>>>>>>>>>
>>>>>>>>>>> Thanks to Jianfeng.
>>>>>>>>>>>
>>>>>>>>>>> Do you  know any plan on this?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jun 30, 2017 at 11:32 AM, Jianfeng (Jeff) Zhang <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> HA is not supported, there’s still  lots of configuration files
>>>>>>>>>>>> stored in local file system.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Best Regard,
>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> From: wenxing zheng <[email protected]>
>>>>>>>>>>>> Reply-To: "[email protected]" <
>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>> Date: Friday, June 30, 2017 at 9:40 AM
>>>>>>>>>>>> To: "[email protected]" <[email protected]>
>>>>>>>>>>>> Subject: Query about the high availability of Zeppelin
>>>>>>>>>>>>
>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>
>>>>>>>>>>>> I still didn't find any docs on this topic? Appreciated if
>>>>>>>>>>>> anyone can shed some lights on how to get the Zeppelin into a 
>>>>>>>>>>>> cluster with
>>>>>>>>>>>> shared/centralized storage
>>>>>>>>>>>>
>>>>>>>>>>>> Regards, Wenxing
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

Re: Query about the high availability of Zeppelin

Reply via email to