put your big results somewhere else not in Z’s memory?

> On Aug 5, 2016, at 12:26 PM, Egor Pahomov <pahomov.e...@gmail.com> wrote:
> 
> - Use spark driver in “cluster mode” where driver runs on a worker instead of 
> the node running Z
> 
> Even without driver Z is heavy process. You need a lot of RAM to keep big 
> results from job. And most of all - zeppelin 0.5.6 does not support cluster 
> mode and I'm not ready to move to 0.6. 
> 
> 2016-08-05 12:03 GMT-07:00 Mohit Jaggi <mohitja...@gmail.com 
> <mailto:mohitja...@gmail.com>>:
> Egor,
> Running a scale out system like Spark with multiple users is always tricky. 
> Operating systems are designed to let multiple users share a single machine. 
> But for “big data” a single user requires the use of several machines which 
> is the exact opposite. Having said that I would suggest the following:
> 
> - Use spark driver in “cluster mode” where driver runs on a worker instead of 
> the node running Z
> - Set appropriate limits/sizes in spark master configuration
> - run separate instances of Z per user, but then you will have a tough time 
> collaborating and sharing notebooks…maybe they can be stored in a shared 
> space and all Z instances can read them but I am afraid that shared access 
> might clobber the files. Z developers can tell us if that is true
> 
> Another alternative is virtualization using containers but I think that will 
> not be easy either.
> 
> Mohit
> Founder,
> Data Orchard LLC
> www.dataorchardllc.com <http://www.dataorchardllc.com/>
> 
> 
>> On Aug 5, 2016, at 11:45 AM, Egor Pahomov <pahomov.e...@gmail.com 
>> <mailto:pahomov.e...@gmail.com>> wrote:
>> 
>> Hi,  I'd like to discuss best practices for using zeppelin in the multi-user 
>> environment. There are several naive approaches, I've tried for at least 
>> couple month each and not a single one worked:
>> 
>> All users on one zeppelin. 
>> One spark context - people really break sc and when they are all in the same 
>> boat a single person can stop many from working.
>> No resource management support. One person can allocate all resources for a 
>> long time
>> The number of notebooks is enormous - it's hard to find anything in it. 
>> No security separation - everyone sees everything. I do not care about 
>> security, but I care about fool prove. And people can accidently delete 
>> notebooks of each other. 
>> Every user has his own Zeppelin on one machine
>> Every zeppelin instance eats memory for zeppelin itself. It's not enough 
>> memory at some point.
>> Every spark driver(I use yarn client mode) eats memory. Same issue.
>> Single point of failure
>> Cores might be not enough
>> I can not prove it, but even if memory and cores enough, Zeppelin experience 
>> problems when it's >10 zeppelin instances on one machine. Do not know for 
>> which reason, maybe it's spark driver issues. 
>> Our current approach:
>> Every department has it's own VM, it's own zeppelin in it.
>> I'm not Devops I do not have experience support multiple VM
>> It's expensive to have hardware for a lot of VM
>> Most of this hardware do not work even 20% of the time. 
>> 
>> How are you dealing with this situation? 
>> 
>> 
>> -- 
>> Sincerely yours
>> Egor Pakhomov
> 
> 
> 
> 
> -- 
> Sincerely yours
> Egor Pakhomov

Reply via email to