Hi, I'd like to discuss best practices for using zeppelin in the multi-user environment. There are several naive approaches, I've tried for at least couple month each and not a single one worked:
*All users on one zeppelin.* - One spark context - people really break sc and when they are all in the same boat a single person can stop many from working. - No resource management support. One person can allocate all resources for a long time - The number of notebooks is enormous - it's hard to find anything in it. - No security separation - everyone sees everything. I do not care about security, but I care about fool prove. And people can accidently delete notebooks of each other. *Every user has his own Zeppelin on one machine* - Every zeppelin instance eats memory for zeppelin itself. It's not enough memory at some point. - Every spark driver(I use yarn client mode) eats memory. Same issue. - Single point of failure - Cores might be not enough - I can not prove it, but even if memory and cores enough, Zeppelin experience problems when it's >10 zeppelin instances on one machine. Do not know for which reason, maybe it's spark driver issues. Our current approach: *Every department has it's own VM, it's own zeppelin in it.* - I'm not Devops I do not have experience support multiple VM - It's expensive to have hardware for a lot of VM - Most of this hardware do not work even 20% of the time. How are you dealing with this situation? -- *Sincerely yoursEgor Pakhomov*