Hi guys,

I create an umbrella ticket for multiple user support for zeppelin as I can
see more and more requirement of this from community, feel free to add sub
task that I miss.

https://issues.apache.org/jira/browse/ZEPPELIN-1337



On Tue, Aug 9, 2016 at 9:20 AM, Alexander Bezzubov <b...@apache.org> wrote:

> Hi Egor,
>
> let me share two approaches that we used to archive Apache Zeppelin
> working in muliteanant environment with Apache Spark:
>
> - run a separate container with Zeppelin per-user, from the small cluster
> of Docker machines (so a single machine runs just 2-3 containers \w
> SparkContexts)
> This works well, supports Spark standalone cluster but requires central
> external auth and small "resource manager" to allocate the containers to
> the Docker cluster + a reverse proxy as a single point of entry for the
> user.
> We have implemented all this in one binary under openaource project called
> Z-Manager Muliteanancy, you can get more details here [1]. It is beta and
> we didnt have capacity to support it recently.
>
> - run single Zeppelin with Auth enabled + Livy interpreter + Spark in YARN
> server mode
>
> This is more generic solution, but it requires particular cluster
> configuration. Here YARN is used as a resource manager to handle multiple
> Spark contexts/drivers processes on the same cluster as tasks themselves.
> Afaik Mesos cluster might be used instead by I do not have first hand
> experience about it. You can read more about it here [2].
>
> Hope this helps!
>
> --
>
> Alex
>
> 1. https://github.com/NFLabs/z-manager/blob/master/multitenancy/README.md
> 2. http://zeppelin.apache.org/docs/0.6.0/interpreter/livy.html
>
> On Sat, Aug 6, 2016, 06:12 vincent gromakowski <
> vincent.gromakow...@gmail.com> wrote:
>
>> One zeppelin per user in mesos container on datanode type server is fine
>> for me. An Ansible script configure each instance with user specifities and
>> launch it in Marathon. A service discovery (basic shell script) update an
>> apache server with basic auth and route each user to his instance. Mesos
>> also run a SMACK stack on which zeppelin rely.
>>
>> Le 5 août 2016 11:01 PM, "Egor Pahomov" <pahomov.e...@gmail.com> a
>> écrit :
>>
>> I need to build a chart for 10 days for all countries(200) for several
>> products by some dimensions. I would need at least 4-6 gb per zeppelin for
>> it.
>>
>> 2016-08-05 12:31 GMT-07:00 Mohit Jaggi <mohitja...@gmail.com>:
>>
>>> put your big results somewhere else not in Z’s memory?
>>>
>>> On Aug 5, 2016, at 12:26 PM, Egor Pahomov <pahomov.e...@gmail.com>
>>> wrote:
>>>
>>> - Use spark driver in “cluster mode” where driver runs on a worker
>>>> instead of the node running Z
>>>
>>>
>>> Even without driver Z is heavy process. You need a lot of RAM to keep
>>> big results from job. And most of all - zeppelin 0.5.6 does not support
>>> cluster mode and I'm not ready to move to 0.6.
>>>
>>> 2016-08-05 12:03 GMT-07:00 Mohit Jaggi <mohitja...@gmail.com>:
>>>
>>>> Egor,
>>>> Running a scale out system like Spark with multiple users is always
>>>> tricky. Operating systems are designed to let multiple users share a single
>>>> machine. But for “big data” a single user requires the use of several
>>>> machines which is the exact opposite. Having said that I would suggest the
>>>> following:
>>>>
>>>> - Use spark driver in “cluster mode” where driver runs on a worker
>>>> instead of the node running Z
>>>> - Set appropriate limits/sizes in spark master configuration
>>>> - run separate instances of Z per user, but then you will have a tough
>>>> time collaborating and sharing notebooks…maybe they can be stored in a
>>>> shared space and all Z instances can read them but I am afraid that shared
>>>> access might clobber the files. Z developers can tell us if that is true
>>>>
>>>> Another alternative is virtualization using containers but I think that
>>>> will not be easy either.
>>>>
>>>> Mohit
>>>> Founder,
>>>> Data Orchard LLC
>>>> www.dataorchardllc.com
>>>>
>>>>
>>>> On Aug 5, 2016, at 11:45 AM, Egor Pahomov <pahomov.e...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi,  I'd like to discuss best practices for using zeppelin in the
>>>> multi-user environment. There are several naive approaches, I've tried for
>>>> at least couple month each and not a single one worked:
>>>>
>>>> *All users on one zeppelin.*
>>>>
>>>>    - One spark context - people really break sc and when they are all
>>>>    in the same boat a single person can stop many from working.
>>>>    - No resource management support. One person can allocate all
>>>>    resources for a long time
>>>>    - The number of notebooks is enormous - it's hard to find anything
>>>>    in it.
>>>>    - No security separation - everyone sees everything. I do not care
>>>>    about security, but I care about fool prove. And people can accidently
>>>>    delete notebooks of each other.
>>>>
>>>> *Every user has his own Zeppelin on one machine*
>>>>
>>>>    - Every zeppelin instance eats memory for zeppelin itself. It's not
>>>>    enough memory at some point.
>>>>    - Every spark driver(I use yarn client mode) eats memory. Same
>>>>    issue.
>>>>    - Single point of failure
>>>>    - Cores might be not enough
>>>>    - I can not prove it, but even if memory and cores enough, Zeppelin
>>>>    experience problems when it's >10 zeppelin instances on one machine. Do 
>>>> not
>>>>    know for which reason, maybe it's spark driver issues.
>>>>
>>>> Our current approach:
>>>> *Every department has it's own VM, it's own zeppelin in it.*
>>>>
>>>>    - I'm not Devops I do not have experience support multiple VM
>>>>    - It's expensive to have hardware for a lot of VM
>>>>    - Most of this hardware do not work even 20% of the time.
>>>>
>>>>
>>>> How are you dealing with this situation?
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> *Sincerely yoursEgor Pakhomov*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>>
>>> *Sincerely yoursEgor Pakhomov*
>>>
>>>
>>>
>>
>>
>> --
>>
>>
>> *Sincerely yoursEgor Pakhomov*
>>
>>
>>


-- 
Best Regards

Jeff Zhang

Reply via email to