Re: Multiuser support of Zeppelin.

Jianfeng (Jeff) Zhang Sun, 07 Aug 2016 18:26:02 -0700

I am glad to see this thread about the multi-user support of zeppelin. I think 
this is a very important and urgent feature for zeppelin’s next step.


Here’s the issues that I see for multiuser support, not sure whether there’s 
umbrella ticket for multiple user support, if not, then I think we should 
create one and start that.

1.  Interpreter setting.
- User level interpreter setting. For now the interpreter setting is global 
applied. That means if user A change interpreter setting of spark, it would 
apply to others, this is a pretty bad user experience.
2.  Interpreter Instance.
        - Although there’s several options for this, but the default behavior 
is to share the interpreter instance. ZEPPELIN-1210 is to creating interpreter 
per user, I think this should be the default behavior.
        - Performance issue, as zeppelin only support yarn-client mode ( I 
think the yarn-cluster mode in the previous reply means livy ). Supporting 
yarn-cluster mode for the native spark interpreter should also been necessary.

3.  Note management
        -  For now, there’s no concept like workspace for user. Each user can 
see all the notes, it’s pretty hard to manage and organize that. I think there 
should be a module for managing and organizing the notes per user.

4.  Secured cluster.
        -  In the kerberized environment, all the interpreter star the same 
keytab/principal, this is pretty dangerous. E.g. User A can use shell 
interpreter which run as user B to delete all the files owned by user B.


Best Regard,
Jeff Zhang


From: vincent gromakowski 
<vincent.gromakow...@gmail.com<mailto:vincent.gromakow...@gmail.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Saturday, August 6, 2016 at 5:11 AM
To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: Re: Multiuser support of Zeppelin.


One zeppelin per user in mesos container on datanode type server is fine for 
me. An Ansible script configure each instance with user specifities and launch 
it in Marathon. A service discovery (basic shell script) update an apache 
server with basic auth and route each user to his instance. Mesos also run a 
SMACK stack on which zeppelin rely.

Le 5 août 2016 11:01 PM, "Egor Pahomov" 
<pahomov.e...@gmail.com<mailto:pahomov.e...@gmail.com>> a écrit :
I need to build a chart for 10 days for all countries(200) for several products 
by some dimensions. I would need at least 4-6 gb per zeppelin for it.

2016-08-05 12:31 GMT-07:00 Mohit Jaggi 
<mohitja...@gmail.com<mailto:mohitja...@gmail.com>>:
put your big results somewhere else not in Z’s memory?

On Aug 5, 2016, at 12:26 PM, Egor Pahomov 
<pahomov.e...@gmail.com<mailto:pahomov.e...@gmail.com>> wrote:

- Use spark driver in “cluster mode” where driver runs on a worker instead of 
the node running Z

Even without driver Z is heavy process. You need a lot of RAM to keep big 
results from job. And most of all - zeppelin 0.5.6 does not support cluster 
mode and I'm not ready to move to 0.6.

2016-08-05 12:03 GMT-07:00 Mohit Jaggi 
<mohitja...@gmail.com<mailto:mohitja...@gmail.com>>:
Egor,
Running a scale out system like Spark with multiple users is always tricky. 
Operating systems are designed to let multiple users share a single machine. 
But for “big data” a single user requires the use of several machines which is 
the exact opposite. Having said that I would suggest the following:

- Use spark driver in “cluster mode” where driver runs on a worker instead of 
the node running Z
- Set appropriate limits/sizes in spark master configuration
- run separate instances of Z per user, but then you will have a tough time 
collaborating and sharing notebooks…maybe they can be stored in a shared space 
and all Z instances can read them but I am afraid that shared access might 
clobber the files. Z developers can tell us if that is true

Another alternative is virtualization using containers but I think that will 
not be easy either.

Mohit
Founder,
Data Orchard LLC
www.dataorchardllc.com<http://www.dataorchardllc.com/>


On Aug 5, 2016, at 11:45 AM, Egor Pahomov 
<pahomov.e...@gmail.com<mailto:pahomov.e...@gmail.com>> wrote:

Hi,  I'd like to discuss best practices for using zeppelin in the multi-user 
environment. There are several naive approaches, I've tried for at least couple 
month each and not a single one worked:

All users on one zeppelin.

  *   One spark context - people really break sc and when they are all in the 
same boat a single person can stop many from working.
  *   No resource management support. One person can allocate all resources for 
a long time
  *   The number of notebooks is enormous - it's hard to find anything in it.
  *   No security separation - everyone sees everything. I do not care about 
security, but I care about fool prove. And people can accidently delete 
notebooks of each other.

Every user has his own Zeppelin on one machine

  *   Every zeppelin instance eats memory for zeppelin itself. It's not enough 
memory at some point.
  *   Every spark driver(I use yarn client mode) eats memory. Same issue.
  *   Single point of failure
  *   Cores might be not enough
  *   I can not prove it, but even if memory and cores enough, Zeppelin 
experience problems when it's >10 zeppelin instances on one machine. Do not 
know for which reason, maybe it's spark driver issues.

Our current approach:
Every department has it's own VM, it's own zeppelin in it.

  *   I'm not Devops I do not have experience support multiple VM
  *   It's expensive to have hardware for a lot of VM
  *   Most of this hardware do not work even 20% of the time.

How are you dealing with this situation?


--
Sincerely yours
Egor Pakhomov




--
Sincerely yours
Egor Pakhomov




--
Sincerely yours
Egor Pakhomov

Re: Multiuser support of Zeppelin.

Reply via email to