Re: Query about the high availability of Zeppelin

wenxing zheng Mon, 03 Jul 2017 18:15:07 -0700

Thanks for all for the expertise. seems it's a long way to go.

If we don't consider the HA issue, how about the performance of the
Zeppelin? Mainly I would like to get it run as a job scheduler first.


On Tue, Jul 4, 2017 at 6:34 AM, Jeff Zhang <zjf...@gmail.com> wrote:

>
> Thanks for the sharing, Ruslan.  This might be useful for some users'
> scenario.  And I can see 2 limitations for this approach.
>
> * Scoped mode per user would not work, since different users would route
> to different zeppelin instance. That means data can not be shared across
> users.
> * Interpreter recovery is not possible.
>
> If we want to implement HA, it would be better a full HA feature that
> could be used in all scenario. It would require a lot of work to do, and
> there's other higher priority things needs to be done first. Anyway your
> approach is still useful for some users, Thanks for your sharing again.
>
>
>
>
>
> Ruslan Dautkhanov <dautkha...@gmail.com>于2017年7月4日周二 上午12:41写道：
>
>> Jeff,
>>
>> Here's scenario:
>> - Zeppelin servers (ZS) are running on the same port on two servers (ZS1
>> and ZS2)
>> - Load balancer (LB) routes individual user's requests always to the same
>> one server (either ZS1 or ZS2)
>>   through sticky sessions (SS). Different end users may end up on
>> different servers though.
>>   The same one user will always go to that one ZS, chosen 1st time it
>> made a connection.
>> - If either ZS becomes unavailable, LB reroutes all user's connections to
>> one that is alive.
>>   Yes, it may mean that: (a) users will have to relogin once they failed
>> over and (b) it'll be a
>>   new interpreter processes span up once users re-open their notebooks.
>>
>> So it's possible to have this kind of Zeppelin HA without code changes in
>> Zeppelin.
>>
>> A nice extension to that would be recovery of interpreters that you're
>> talking about.
>> My undertsanding it's technically possible, for example, to reconnect to
>> existing
>> Spark driver after failover (in deply-mode=cluster) instead of creating a
>> new Spark driver .
>>
>> Great topic. Thank you!
>>
>>
>>
>> --
>> Ruslan Dautkhanov
>>
>> On Fri, Jun 30, 2017 at 7:00 PM, Jeff Zhang <zjf...@gmail.com> wrote:
>>
>>>
>>> Basically Zeppelin HA require 2 major things:
>>>
>>> 1. Shared storage (storage for notebook, interpreter setting,
>>> zeppelin-site.xml, zeppelin-env.sh, shiro.ini, credentials.json)
>>> 2. Recover running interpreter.The standby zeppelin instance don't know
>>> where are the running interpreter (host:port), so it can not recover the
>>> running interpreters when standby zeppelin become active. Maybe we can
>>> store the runtime info in zookeeper, anyway it requires more design and
>>> discussion.
>>>
>>>
>>>
>>> Ruslan Dautkhanov <dautkha...@gmail.com>于2017年7月1日周六 上午8:07写道：
>>>
>>>> I think if you have a shared storage for notebooks (for example, NFS
>>>> mounted from a third server),
>>>> and a load-balancer that supports sticky sessions (like F5) on top, it
>>>> should be possible to have HA without
>>>> any code change in Zeppelin. Am I missing something?
>>>>
>>>>
>>>>
>>>> --
>>>> Ruslan Dautkhanov
>>>>
>>>> On Fri, Jun 30, 2017 at 5:54 PM, Alexander Filipchik <
>>>> afilipc...@gmail.com> wrote:
>>>>
>>>>> Honestly,  HA requires more than just active stand by.
>>>>> It should be able to scale without major surgeries, which is not
>>>>> possible right now. For example, if you start too many interpreters,
>>>>> zeppelin box will simply run out of memory.
>>>>>
>>>>> Alex
>>>>>
>>>>> On Thu, Jun 29, 2017 at 10:59 PM, wenxing zheng <
>>>>> wenxing.zh...@gmail.com> wrote:
>>>>>
>>>>>> at first, I would think GIT storage is a good option and we can push
>>>>>> and pull the changes regularly.
>>>>>>
>>>>>> With multiple zeppelin instances, maybe we need a new component or
>>>>>> service to act as a distributed scheduler: dispatch the Job to and manage
>>>>>> the Jobs on the Zeppelin instances.
>>>>>>
>>>>>> On Fri, Jun 30, 2017 at 1:26 PM, Vinay Shukla <vinayshu...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Here is what I think should be part of HA consideration:
>>>>>>>
>>>>>>>    1. Have multiple Zeppelin Instances
>>>>>>>    2. Have the notebooks storage backed by something like an NFS so
>>>>>>>    all notebooks are visible across all Zeppelin instances
>>>>>>>    3. Put multiple load balancers infront of Zeppelin to route
>>>>>>>    requests.
>>>>>>>
>>>>>>> Consider that HA needs scalability, which depends on which
>>>>>>> interpreter you plan to use. So you might need to consider HA at both
>>>>>>> Zeppelin and interpreter level. For example if you were using Z + Livy +
>>>>>>> Spark, you will need to consider scalability + HA needs of Z + Livy
>>>>>>> interpreter + Livy Server + Spark (on Cluster manager).
>>>>>>>
>>>>>>> On Thu, Jun 29, 2017 at 10:04 PM, wenxing zheng <
>>>>>>> wenxing.zh...@gmail.com> wrote:
>>>>>>>
>>>>>>>> and do we have any architecture doc for reference? Because we need
>>>>>>>> to add the HA capability as soon as possible, hope we can figure it 
>>>>>>>> out.
>>>>>>>>
>>>>>>>> On Fri, Jun 30, 2017 at 12:33 PM, wenxing zheng <
>>>>>>>> wenxing.zh...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks to Jeff and Moon.
>>>>>>>>>
>>>>>>>>> So currently the active-active model doesn't work on GIT storage,
>>>>>>>>> am I right?
>>>>>>>>>
>>>>>>>>> On Fri, Jun 30, 2017 at 12:16 PM, moon soo Lee <m...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Basically active-(hot)standby model would work.
>>>>>>>>>> Two or more Zeppelin instance can be started and pointing the
>>>>>>>>>> same notebook storage, if only one Zeppelin instance (active) change
>>>>>>>>>> notebook at any given time.
>>>>>>>>>>
>>>>>>>>>> In case of the active instance fails, one of rest instance can
>>>>>>>>>> take over the role by refreshing notebook list and start make change.
>>>>>>>>>>
>>>>>>>>>> But all these fail over is not provided by Zeppelin and need to
>>>>>>>>>> depends on external script or HA software (like Heartbeat).
>>>>>>>>>>
>>>>>>>>>> Like Jeff mentioned, community does not have concrete plan for
>>>>>>>>>> having HA built-in at this moment.
>>>>>>>>>>
>>>>>>>>>> Hope this helps,
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> moon
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 30, 2017 at 1:01 PM Jeff Zhang <zjf...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> No concrete plan for that. There're other higher priority things
>>>>>>>>>>> need to be done. At least it would not be available in 0.8, maybe 
>>>>>>>>>>> after 1.0
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> wenxing zheng <wenxing.zh...@gmail.com>于2017年6月30日周五 上午11:47写道：
>>>>>>>>>>>
>>>>>>>>>>>> Thanks to Jianfeng.
>>>>>>>>>>>>
>>>>>>>>>>>> Do you  know any plan on this?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jun 30, 2017 at 11:32 AM, Jianfeng (Jeff) Zhang <
>>>>>>>>>>>> jzh...@hortonworks.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> HA is not supported, there’s still  lots of configuration
>>>>>>>>>>>>> files stored in local file system.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best Regard,
>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> From: wenxing zheng <wenxing.zh...@gmail.com>
>>>>>>>>>>>>> Reply-To: "users@zeppelin.apache.org" <
>>>>>>>>>>>>> users@zeppelin.apache.org>
>>>>>>>>>>>>> Date: Friday, June 30, 2017 at 9:40 AM
>>>>>>>>>>>>> To: "users@zeppelin.apache.org" <users@zeppelin.apache.org>
>>>>>>>>>>>>> Subject: Query about the high availability of Zeppelin
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I still didn't find any docs on this topic? Appreciated if
>>>>>>>>>>>>> anyone can shed some lights on how to get the Zeppelin into a 
>>>>>>>>>>>>> cluster with
>>>>>>>>>>>>> shared/centralized storage
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards, Wenxing
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>

Re: Query about the high availability of Zeppelin

Reply via email to