Egor, Running a scale out system like Spark with multiple users is always tricky. Operating systems are designed to let multiple users share a single machine. But for “big data” a single user requires the use of several machines which is the exact opposite. Having said that I would suggest the following:
- Use spark driver in “cluster mode” where driver runs on a worker instead of the node running Z - Set appropriate limits/sizes in spark master configuration - run separate instances of Z per user, but then you will have a tough time collaborating and sharing notebooks…maybe they can be stored in a shared space and all Z instances can read them but I am afraid that shared access might clobber the files. Z developers can tell us if that is true Another alternative is virtualization using containers but I think that will not be easy either. Mohit Founder, Data Orchard LLC www.dataorchardllc.com > On Aug 5, 2016, at 11:45 AM, Egor Pahomov <pahomov.e...@gmail.com> wrote: > > Hi, I'd like to discuss best practices for using zeppelin in the multi-user > environment. There are several naive approaches, I've tried for at least > couple month each and not a single one worked: > > All users on one zeppelin. > One spark context - people really break sc and when they are all in the same > boat a single person can stop many from working. > No resource management support. One person can allocate all resources for a > long time > The number of notebooks is enormous - it's hard to find anything in it. > No security separation - everyone sees everything. I do not care about > security, but I care about fool prove. And people can accidently delete > notebooks of each other. > Every user has his own Zeppelin on one machine > Every zeppelin instance eats memory for zeppelin itself. It's not enough > memory at some point. > Every spark driver(I use yarn client mode) eats memory. Same issue. > Single point of failure > Cores might be not enough > I can not prove it, but even if memory and cores enough, Zeppelin experience > problems when it's >10 zeppelin instances on one machine. Do not know for > which reason, maybe it's spark driver issues. > Our current approach: > Every department has it's own VM, it's own zeppelin in it. > I'm not Devops I do not have experience support multiple VM > It's expensive to have hardware for a lot of VM > Most of this hardware do not work even 20% of the time. > > How are you dealing with this situation? > > > -- > Sincerely yours > Egor Pakhomov