I am glad to see this thread about the multi-user support of zeppelin. I think this is a very important and urgent feature for zeppelin’s next step.
Here’s the issues that I see for multiuser support, not sure whether there’s umbrella ticket for multiple user support, if not, then I think we should create one and start that. 1. Interpreter setting. - User level interpreter setting. For now the interpreter setting is global applied. That means if user A change interpreter setting of spark, it would apply to others, this is a pretty bad user experience. 2. Interpreter Instance. - Although there’s several options for this, but the default behavior is to share the interpreter instance. ZEPPELIN-1210 is to creating interpreter per user, I think this should be the default behavior. - Performance issue, as zeppelin only support yarn-client mode ( I think the yarn-cluster mode in the previous reply means livy ). Supporting yarn-cluster mode for the native spark interpreter should also been necessary. 3. Note management - For now, there’s no concept like workspace for user. Each user can see all the notes, it’s pretty hard to manage and organize that. I think there should be a module for managing and organizing the notes per user. 4. Secured cluster. - In the kerberized environment, all the interpreter star the same keytab/principal, this is pretty dangerous. E.g. User A can use shell interpreter which run as user B to delete all the files owned by user B. Best Regard, Jeff Zhang From: vincent gromakowski <vincent.gromakow...@gmail.com<mailto:vincent.gromakow...@gmail.com>> Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" <users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>> Date: Saturday, August 6, 2016 at 5:11 AM To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" <users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>> Subject: Re: Multiuser support of Zeppelin. One zeppelin per user in mesos container on datanode type server is fine for me. An Ansible script configure each instance with user specifities and launch it in Marathon. A service discovery (basic shell script) update an apache server with basic auth and route each user to his instance. Mesos also run a SMACK stack on which zeppelin rely. Le 5 août 2016 11:01 PM, "Egor Pahomov" <pahomov.e...@gmail.com<mailto:pahomov.e...@gmail.com>> a écrit : I need to build a chart for 10 days for all countries(200) for several products by some dimensions. I would need at least 4-6 gb per zeppelin for it. 2016-08-05 12:31 GMT-07:00 Mohit Jaggi <mohitja...@gmail.com<mailto:mohitja...@gmail.com>>: put your big results somewhere else not in Z’s memory? On Aug 5, 2016, at 12:26 PM, Egor Pahomov <pahomov.e...@gmail.com<mailto:pahomov.e...@gmail.com>> wrote: - Use spark driver in “cluster mode” where driver runs on a worker instead of the node running Z Even without driver Z is heavy process. You need a lot of RAM to keep big results from job. And most of all - zeppelin 0.5.6 does not support cluster mode and I'm not ready to move to 0.6. 2016-08-05 12:03 GMT-07:00 Mohit Jaggi <mohitja...@gmail.com<mailto:mohitja...@gmail.com>>: Egor, Running a scale out system like Spark with multiple users is always tricky. Operating systems are designed to let multiple users share a single machine. But for “big data” a single user requires the use of several machines which is the exact opposite. Having said that I would suggest the following: - Use spark driver in “cluster mode” where driver runs on a worker instead of the node running Z - Set appropriate limits/sizes in spark master configuration - run separate instances of Z per user, but then you will have a tough time collaborating and sharing notebooks…maybe they can be stored in a shared space and all Z instances can read them but I am afraid that shared access might clobber the files. Z developers can tell us if that is true Another alternative is virtualization using containers but I think that will not be easy either. Mohit Founder, Data Orchard LLC www.dataorchardllc.com<http://www.dataorchardllc.com/> On Aug 5, 2016, at 11:45 AM, Egor Pahomov <pahomov.e...@gmail.com<mailto:pahomov.e...@gmail.com>> wrote: Hi, I'd like to discuss best practices for using zeppelin in the multi-user environment. There are several naive approaches, I've tried for at least couple month each and not a single one worked: All users on one zeppelin. * One spark context - people really break sc and when they are all in the same boat a single person can stop many from working. * No resource management support. One person can allocate all resources for a long time * The number of notebooks is enormous - it's hard to find anything in it. * No security separation - everyone sees everything. I do not care about security, but I care about fool prove. And people can accidently delete notebooks of each other. Every user has his own Zeppelin on one machine * Every zeppelin instance eats memory for zeppelin itself. It's not enough memory at some point. * Every spark driver(I use yarn client mode) eats memory. Same issue. * Single point of failure * Cores might be not enough * I can not prove it, but even if memory and cores enough, Zeppelin experience problems when it's >10 zeppelin instances on one machine. Do not know for which reason, maybe it's spark driver issues. Our current approach: Every department has it's own VM, it's own zeppelin in it. * I'm not Devops I do not have experience support multiple VM * It's expensive to have hardware for a lot of VM * Most of this hardware do not work even 20% of the time. How are you dealing with this situation? -- Sincerely yours Egor Pakhomov -- Sincerely yours Egor Pakhomov -- Sincerely yours Egor Pakhomov