Hi guys, I create an umbrella ticket for multiple user support for zeppelin as I can see more and more requirement of this from community, feel free to add sub task that I miss.
https://issues.apache.org/jira/browse/ZEPPELIN-1337 On Tue, Aug 9, 2016 at 9:20 AM, Alexander Bezzubov <b...@apache.org> wrote: > Hi Egor, > > let me share two approaches that we used to archive Apache Zeppelin > working in muliteanant environment with Apache Spark: > > - run a separate container with Zeppelin per-user, from the small cluster > of Docker machines (so a single machine runs just 2-3 containers \w > SparkContexts) > This works well, supports Spark standalone cluster but requires central > external auth and small "resource manager" to allocate the containers to > the Docker cluster + a reverse proxy as a single point of entry for the > user. > We have implemented all this in one binary under openaource project called > Z-Manager Muliteanancy, you can get more details here [1]. It is beta and > we didnt have capacity to support it recently. > > - run single Zeppelin with Auth enabled + Livy interpreter + Spark in YARN > server mode > > This is more generic solution, but it requires particular cluster > configuration. Here YARN is used as a resource manager to handle multiple > Spark contexts/drivers processes on the same cluster as tasks themselves. > Afaik Mesos cluster might be used instead by I do not have first hand > experience about it. You can read more about it here [2]. > > Hope this helps! > > -- > > Alex > > 1. https://github.com/NFLabs/z-manager/blob/master/multitenancy/README.md > 2. http://zeppelin.apache.org/docs/0.6.0/interpreter/livy.html > > On Sat, Aug 6, 2016, 06:12 vincent gromakowski < > vincent.gromakow...@gmail.com> wrote: > >> One zeppelin per user in mesos container on datanode type server is fine >> for me. An Ansible script configure each instance with user specifities and >> launch it in Marathon. A service discovery (basic shell script) update an >> apache server with basic auth and route each user to his instance. Mesos >> also run a SMACK stack on which zeppelin rely. >> >> Le 5 août 2016 11:01 PM, "Egor Pahomov" <pahomov.e...@gmail.com> a >> écrit : >> >> I need to build a chart for 10 days for all countries(200) for several >> products by some dimensions. I would need at least 4-6 gb per zeppelin for >> it. >> >> 2016-08-05 12:31 GMT-07:00 Mohit Jaggi <mohitja...@gmail.com>: >> >>> put your big results somewhere else not in Z’s memory? >>> >>> On Aug 5, 2016, at 12:26 PM, Egor Pahomov <pahomov.e...@gmail.com> >>> wrote: >>> >>> - Use spark driver in “cluster mode” where driver runs on a worker >>>> instead of the node running Z >>> >>> >>> Even without driver Z is heavy process. You need a lot of RAM to keep >>> big results from job. And most of all - zeppelin 0.5.6 does not support >>> cluster mode and I'm not ready to move to 0.6. >>> >>> 2016-08-05 12:03 GMT-07:00 Mohit Jaggi <mohitja...@gmail.com>: >>> >>>> Egor, >>>> Running a scale out system like Spark with multiple users is always >>>> tricky. Operating systems are designed to let multiple users share a single >>>> machine. But for “big data” a single user requires the use of several >>>> machines which is the exact opposite. Having said that I would suggest the >>>> following: >>>> >>>> - Use spark driver in “cluster mode” where driver runs on a worker >>>> instead of the node running Z >>>> - Set appropriate limits/sizes in spark master configuration >>>> - run separate instances of Z per user, but then you will have a tough >>>> time collaborating and sharing notebooks…maybe they can be stored in a >>>> shared space and all Z instances can read them but I am afraid that shared >>>> access might clobber the files. Z developers can tell us if that is true >>>> >>>> Another alternative is virtualization using containers but I think that >>>> will not be easy either. >>>> >>>> Mohit >>>> Founder, >>>> Data Orchard LLC >>>> www.dataorchardllc.com >>>> >>>> >>>> On Aug 5, 2016, at 11:45 AM, Egor Pahomov <pahomov.e...@gmail.com> >>>> wrote: >>>> >>>> Hi, I'd like to discuss best practices for using zeppelin in the >>>> multi-user environment. There are several naive approaches, I've tried for >>>> at least couple month each and not a single one worked: >>>> >>>> *All users on one zeppelin.* >>>> >>>> - One spark context - people really break sc and when they are all >>>> in the same boat a single person can stop many from working. >>>> - No resource management support. One person can allocate all >>>> resources for a long time >>>> - The number of notebooks is enormous - it's hard to find anything >>>> in it. >>>> - No security separation - everyone sees everything. I do not care >>>> about security, but I care about fool prove. And people can accidently >>>> delete notebooks of each other. >>>> >>>> *Every user has his own Zeppelin on one machine* >>>> >>>> - Every zeppelin instance eats memory for zeppelin itself. It's not >>>> enough memory at some point. >>>> - Every spark driver(I use yarn client mode) eats memory. Same >>>> issue. >>>> - Single point of failure >>>> - Cores might be not enough >>>> - I can not prove it, but even if memory and cores enough, Zeppelin >>>> experience problems when it's >10 zeppelin instances on one machine. Do >>>> not >>>> know for which reason, maybe it's spark driver issues. >>>> >>>> Our current approach: >>>> *Every department has it's own VM, it's own zeppelin in it.* >>>> >>>> - I'm not Devops I do not have experience support multiple VM >>>> - It's expensive to have hardware for a lot of VM >>>> - Most of this hardware do not work even 20% of the time. >>>> >>>> >>>> How are you dealing with this situation? >>>> >>>> >>>> -- >>>> >>>> >>>> *Sincerely yoursEgor Pakhomov* >>>> >>>> >>>> >>> >>> >>> -- >>> >>> >>> *Sincerely yoursEgor Pakhomov* >>> >>> >>> >> >> >> -- >> >> >> *Sincerely yoursEgor Pakhomov* >> >> >> -- Best Regards Jeff Zhang