Hi,  I'd like to discuss best practices for using zeppelin in the
multi-user environment. There are several naive approaches, I've tried for
at least couple month each and not a single one worked:

*All users on one zeppelin.*

   - One spark context - people really break sc and when they are all in
   the same boat a single person can stop many from working.
   - No resource management support. One person can allocate all resources
   for a long time
   - The number of notebooks is enormous - it's hard to find anything in
   it.
   - No security separation - everyone sees everything. I do not care about
   security, but I care about fool prove. And people can accidently delete
   notebooks of each other.

*Every user has his own Zeppelin on one machine*

   - Every zeppelin instance eats memory for zeppelin itself. It's not
   enough memory at some point.
   - Every spark driver(I use yarn client mode) eats memory. Same issue.
   - Single point of failure
   - Cores might be not enough
   - I can not prove it, but even if memory and cores enough, Zeppelin
   experience problems when it's >10 zeppelin instances on one machine. Do not
   know for which reason, maybe it's spark driver issues.

Our current approach:
*Every department has it's own VM, it's own zeppelin in it.*

   - I'm not Devops I do not have experience support multiple VM
   - It's expensive to have hardware for a lot of VM
   - Most of this hardware do not work even 20% of the time.


How are you dealing with this situation?


-- 


*Sincerely yoursEgor Pakhomov*

Reply via email to