One zeppelin per user in mesos container on datanode type server is fine
for me. An Ansible script configure each instance with user specifities and
launch it in Marathon. A service discovery (basic shell script) update an
apache server with basic auth and route each user to his instance. Mesos
also run a SMACK stack on which zeppelin rely.

Le 5 août 2016 11:01 PM, "Egor Pahomov" <pahomov.e...@gmail.com> a écrit :

I need to build a chart for 10 days for all countries(200) for several
products by some dimensions. I would need at least 4-6 gb per zeppelin for
it.

2016-08-05 12:31 GMT-07:00 Mohit Jaggi <mohitja...@gmail.com>:

> put your big results somewhere else not in Z’s memory?
>
> On Aug 5, 2016, at 12:26 PM, Egor Pahomov <pahomov.e...@gmail.com> wrote:
>
> - Use spark driver in “cluster mode” where driver runs on a worker instead
>> of the node running Z
>
>
> Even without driver Z is heavy process. You need a lot of RAM to keep big
> results from job. And most of all - zeppelin 0.5.6 does not support cluster
> mode and I'm not ready to move to 0.6.
>
> 2016-08-05 12:03 GMT-07:00 Mohit Jaggi <mohitja...@gmail.com>:
>
>> Egor,
>> Running a scale out system like Spark with multiple users is always
>> tricky. Operating systems are designed to let multiple users share a single
>> machine. But for “big data” a single user requires the use of several
>> machines which is the exact opposite. Having said that I would suggest the
>> following:
>>
>> - Use spark driver in “cluster mode” where driver runs on a worker
>> instead of the node running Z
>> - Set appropriate limits/sizes in spark master configuration
>> - run separate instances of Z per user, but then you will have a tough
>> time collaborating and sharing notebooks…maybe they can be stored in a
>> shared space and all Z instances can read them but I am afraid that shared
>> access might clobber the files. Z developers can tell us if that is true
>>
>> Another alternative is virtualization using containers but I think that
>> will not be easy either.
>>
>> Mohit
>> Founder,
>> Data Orchard LLC
>> www.dataorchardllc.com
>>
>>
>> On Aug 5, 2016, at 11:45 AM, Egor Pahomov <pahomov.e...@gmail.com> wrote:
>>
>> Hi,  I'd like to discuss best practices for using zeppelin in the
>> multi-user environment. There are several naive approaches, I've tried for
>> at least couple month each and not a single one worked:
>>
>> *All users on one zeppelin.*
>>
>>    - One spark context - people really break sc and when they are all in
>>    the same boat a single person can stop many from working.
>>    - No resource management support. One person can allocate all
>>    resources for a long time
>>    - The number of notebooks is enormous - it's hard to find anything in
>>    it.
>>    - No security separation - everyone sees everything. I do not care
>>    about security, but I care about fool prove. And people can accidently
>>    delete notebooks of each other.
>>
>> *Every user has his own Zeppelin on one machine*
>>
>>    - Every zeppelin instance eats memory for zeppelin itself. It's not
>>    enough memory at some point.
>>    - Every spark driver(I use yarn client mode) eats memory. Same issue.
>>    - Single point of failure
>>    - Cores might be not enough
>>    - I can not prove it, but even if memory and cores enough, Zeppelin
>>    experience problems when it's >10 zeppelin instances on one machine. Do 
>> not
>>    know for which reason, maybe it's spark driver issues.
>>
>> Our current approach:
>> *Every department has it's own VM, it's own zeppelin in it.*
>>
>>    - I'm not Devops I do not have experience support multiple VM
>>    - It's expensive to have hardware for a lot of VM
>>    - Most of this hardware do not work even 20% of the time.
>>
>>
>> How are you dealing with this situation?
>>
>>
>> --
>>
>>
>> *Sincerely yoursEgor Pakhomov*
>>
>>
>>
>
>
> --
>
>
> *Sincerely yoursEgor Pakhomov*
>
>
>


-- 


*Sincerely yoursEgor Pakhomov*

Reply via email to