Egor,
Running a scale out system like Spark with multiple users is always tricky. 
Operating systems are designed to let multiple users share a single machine. 
But for “big data” a single user requires the use of several machines which is 
the exact opposite. Having said that I would suggest the following:

- Use spark driver in “cluster mode” where driver runs on a worker instead of 
the node running Z
- Set appropriate limits/sizes in spark master configuration
- run separate instances of Z per user, but then you will have a tough time 
collaborating and sharing notebooks…maybe they can be stored in a shared space 
and all Z instances can read them but I am afraid that shared access might 
clobber the files. Z developers can tell us if that is true

Another alternative is virtualization using containers but I think that will 
not be easy either.

Mohit
Founder,
Data Orchard LLC
www.dataorchardllc.com


> On Aug 5, 2016, at 11:45 AM, Egor Pahomov <pahomov.e...@gmail.com> wrote:
> 
> Hi,  I'd like to discuss best practices for using zeppelin in the multi-user 
> environment. There are several naive approaches, I've tried for at least 
> couple month each and not a single one worked:
> 
> All users on one zeppelin. 
> One spark context - people really break sc and when they are all in the same 
> boat a single person can stop many from working.
> No resource management support. One person can allocate all resources for a 
> long time
> The number of notebooks is enormous - it's hard to find anything in it. 
> No security separation - everyone sees everything. I do not care about 
> security, but I care about fool prove. And people can accidently delete 
> notebooks of each other. 
> Every user has his own Zeppelin on one machine
> Every zeppelin instance eats memory for zeppelin itself. It's not enough 
> memory at some point.
> Every spark driver(I use yarn client mode) eats memory. Same issue.
> Single point of failure
> Cores might be not enough
> I can not prove it, but even if memory and cores enough, Zeppelin experience 
> problems when it's >10 zeppelin instances on one machine. Do not know for 
> which reason, maybe it's spark driver issues. 
> Our current approach:
> Every department has it's own VM, it's own zeppelin in it.
> I'm not Devops I do not have experience support multiple VM
> It's expensive to have hardware for a lot of VM
> Most of this hardware do not work even 20% of the time. 
> 
> How are you dealing with this situation? 
> 
> 
> -- 
> Sincerely yours
> Egor Pakhomov

Reply via email to