Thanks for all for the expertise. seems it's a long way to go. If we don't consider the HA issue, how about the performance of the Zeppelin? Mainly I would like to get it run as a job scheduler first.
On Tue, Jul 4, 2017 at 6:34 AM, Jeff Zhang <zjf...@gmail.com> wrote: > > Thanks for the sharing, Ruslan. This might be useful for some users' > scenario. And I can see 2 limitations for this approach. > > * Scoped mode per user would not work, since different users would route > to different zeppelin instance. That means data can not be shared across > users. > * Interpreter recovery is not possible. > > If we want to implement HA, it would be better a full HA feature that > could be used in all scenario. It would require a lot of work to do, and > there's other higher priority things needs to be done first. Anyway your > approach is still useful for some users, Thanks for your sharing again. > > > > > > Ruslan Dautkhanov <dautkha...@gmail.com>于2017年7月4日周二 上午12:41写道: > >> Jeff, >> >> Here's scenario: >> - Zeppelin servers (ZS) are running on the same port on two servers (ZS1 >> and ZS2) >> - Load balancer (LB) routes individual user's requests always to the same >> one server (either ZS1 or ZS2) >> through sticky sessions (SS). Different end users may end up on >> different servers though. >> The same one user will always go to that one ZS, chosen 1st time it >> made a connection. >> - If either ZS becomes unavailable, LB reroutes all user's connections to >> one that is alive. >> Yes, it may mean that: (a) users will have to relogin once they failed >> over and (b) it'll be a >> new interpreter processes span up once users re-open their notebooks. >> >> So it's possible to have this kind of Zeppelin HA without code changes in >> Zeppelin. >> >> A nice extension to that would be recovery of interpreters that you're >> talking about. >> My undertsanding it's technically possible, for example, to reconnect to >> existing >> Spark driver after failover (in deply-mode=cluster) instead of creating a >> new Spark driver . >> >> Great topic. Thank you! >> >> >> >> -- >> Ruslan Dautkhanov >> >> On Fri, Jun 30, 2017 at 7:00 PM, Jeff Zhang <zjf...@gmail.com> wrote: >> >>> >>> Basically Zeppelin HA require 2 major things: >>> >>> 1. Shared storage (storage for notebook, interpreter setting, >>> zeppelin-site.xml, zeppelin-env.sh, shiro.ini, credentials.json) >>> 2. Recover running interpreter.The standby zeppelin instance don't know >>> where are the running interpreter (host:port), so it can not recover the >>> running interpreters when standby zeppelin become active. Maybe we can >>> store the runtime info in zookeeper, anyway it requires more design and >>> discussion. >>> >>> >>> >>> Ruslan Dautkhanov <dautkha...@gmail.com>于2017年7月1日周六 上午8:07写道: >>> >>>> I think if you have a shared storage for notebooks (for example, NFS >>>> mounted from a third server), >>>> and a load-balancer that supports sticky sessions (like F5) on top, it >>>> should be possible to have HA without >>>> any code change in Zeppelin. Am I missing something? >>>> >>>> >>>> >>>> -- >>>> Ruslan Dautkhanov >>>> >>>> On Fri, Jun 30, 2017 at 5:54 PM, Alexander Filipchik < >>>> afilipc...@gmail.com> wrote: >>>> >>>>> Honestly, HA requires more than just active stand by. >>>>> It should be able to scale without major surgeries, which is not >>>>> possible right now. For example, if you start too many interpreters, >>>>> zeppelin box will simply run out of memory. >>>>> >>>>> Alex >>>>> >>>>> On Thu, Jun 29, 2017 at 10:59 PM, wenxing zheng < >>>>> wenxing.zh...@gmail.com> wrote: >>>>> >>>>>> at first, I would think GIT storage is a good option and we can push >>>>>> and pull the changes regularly. >>>>>> >>>>>> With multiple zeppelin instances, maybe we need a new component or >>>>>> service to act as a distributed scheduler: dispatch the Job to and manage >>>>>> the Jobs on the Zeppelin instances. >>>>>> >>>>>> On Fri, Jun 30, 2017 at 1:26 PM, Vinay Shukla <vinayshu...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Here is what I think should be part of HA consideration: >>>>>>> >>>>>>> 1. Have multiple Zeppelin Instances >>>>>>> 2. Have the notebooks storage backed by something like an NFS so >>>>>>> all notebooks are visible across all Zeppelin instances >>>>>>> 3. Put multiple load balancers infront of Zeppelin to route >>>>>>> requests. >>>>>>> >>>>>>> Consider that HA needs scalability, which depends on which >>>>>>> interpreter you plan to use. So you might need to consider HA at both >>>>>>> Zeppelin and interpreter level. For example if you were using Z + Livy + >>>>>>> Spark, you will need to consider scalability + HA needs of Z + Livy >>>>>>> interpreter + Livy Server + Spark (on Cluster manager). >>>>>>> >>>>>>> On Thu, Jun 29, 2017 at 10:04 PM, wenxing zheng < >>>>>>> wenxing.zh...@gmail.com> wrote: >>>>>>> >>>>>>>> and do we have any architecture doc for reference? Because we need >>>>>>>> to add the HA capability as soon as possible, hope we can figure it >>>>>>>> out. >>>>>>>> >>>>>>>> On Fri, Jun 30, 2017 at 12:33 PM, wenxing zheng < >>>>>>>> wenxing.zh...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Thanks to Jeff and Moon. >>>>>>>>> >>>>>>>>> So currently the active-active model doesn't work on GIT storage, >>>>>>>>> am I right? >>>>>>>>> >>>>>>>>> On Fri, Jun 30, 2017 at 12:16 PM, moon soo Lee <m...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Basically active-(hot)standby model would work. >>>>>>>>>> Two or more Zeppelin instance can be started and pointing the >>>>>>>>>> same notebook storage, if only one Zeppelin instance (active) change >>>>>>>>>> notebook at any given time. >>>>>>>>>> >>>>>>>>>> In case of the active instance fails, one of rest instance can >>>>>>>>>> take over the role by refreshing notebook list and start make change. >>>>>>>>>> >>>>>>>>>> But all these fail over is not provided by Zeppelin and need to >>>>>>>>>> depends on external script or HA software (like Heartbeat). >>>>>>>>>> >>>>>>>>>> Like Jeff mentioned, community does not have concrete plan for >>>>>>>>>> having HA built-in at this moment. >>>>>>>>>> >>>>>>>>>> Hope this helps, >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> moon >>>>>>>>>> >>>>>>>>>> On Fri, Jun 30, 2017 at 1:01 PM Jeff Zhang <zjf...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> No concrete plan for that. There're other higher priority things >>>>>>>>>>> need to be done. At least it would not be available in 0.8, maybe >>>>>>>>>>> after 1.0 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> wenxing zheng <wenxing.zh...@gmail.com>于2017年6月30日周五 上午11:47写道: >>>>>>>>>>> >>>>>>>>>>>> Thanks to Jianfeng. >>>>>>>>>>>> >>>>>>>>>>>> Do you know any plan on this? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jun 30, 2017 at 11:32 AM, Jianfeng (Jeff) Zhang < >>>>>>>>>>>> jzh...@hortonworks.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> HA is not supported, there’s still lots of configuration >>>>>>>>>>>>> files stored in local file system. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Best Regard, >>>>>>>>>>>>> Jeff Zhang >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> From: wenxing zheng <wenxing.zh...@gmail.com> >>>>>>>>>>>>> Reply-To: "users@zeppelin.apache.org" < >>>>>>>>>>>>> users@zeppelin.apache.org> >>>>>>>>>>>>> Date: Friday, June 30, 2017 at 9:40 AM >>>>>>>>>>>>> To: "users@zeppelin.apache.org" <users@zeppelin.apache.org> >>>>>>>>>>>>> Subject: Query about the high availability of Zeppelin >>>>>>>>>>>>> >>>>>>>>>>>>> Hello all, >>>>>>>>>>>>> >>>>>>>>>>>>> I still didn't find any docs on this topic? Appreciated if >>>>>>>>>>>>> anyone can shed some lights on how to get the Zeppelin into a >>>>>>>>>>>>> cluster with >>>>>>>>>>>>> shared/centralized storage >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, Wenxing >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>