put your big results somewhere else not in Z’s memory?
> On Aug 5, 2016, at 12:26 PM, Egor Pahomov <pahomov.e...@gmail.com> wrote:
>
> - Use spark driver in “cluster mode” where driver runs on a worker instead of
> the node running Z
>
> Even without driver Z is heavy process. You need a lot of RAM to keep big
> results from job. And most of all - zeppelin 0.5.6 does not support cluster
> mode and I'm not ready to move to 0.6.
>
> 2016-08-05 12:03 GMT-07:00 Mohit Jaggi <mohitja...@gmail.com
> <mailto:mohitja...@gmail.com>>:
> Egor,
> Running a scale out system like Spark with multiple users is always tricky.
> Operating systems are designed to let multiple users share a single machine.
> But for “big data” a single user requires the use of several machines which
> is the exact opposite. Having said that I would suggest the following:
>
> - Use spark driver in “cluster mode” where driver runs on a worker instead of
> the node running Z
> - Set appropriate limits/sizes in spark master configuration
> - run separate instances of Z per user, but then you will have a tough time
> collaborating and sharing notebooks…maybe they can be stored in a shared
> space and all Z instances can read them but I am afraid that shared access
> might clobber the files. Z developers can tell us if that is true
>
> Another alternative is virtualization using containers but I think that will
> not be easy either.
>
> Mohit
> Founder,
> Data Orchard LLC
> www.dataorchardllc.com <http://www.dataorchardllc.com/>
>
>
>> On Aug 5, 2016, at 11:45 AM, Egor Pahomov <pahomov.e...@gmail.com
>> <mailto:pahomov.e...@gmail.com>> wrote:
>>
>> Hi, I'd like to discuss best practices for using zeppelin in the multi-user
>> environment. There are several naive approaches, I've tried for at least
>> couple month each and not a single one worked:
>>
>> All users on one zeppelin.
>> One spark context - people really break sc and when they are all in the same
>> boat a single person can stop many from working.
>> No resource management support. One person can allocate all resources for a
>> long time
>> The number of notebooks is enormous - it's hard to find anything in it.
>> No security separation - everyone sees everything. I do not care about
>> security, but I care about fool prove. And people can accidently delete
>> notebooks of each other.
>> Every user has his own Zeppelin on one machine
>> Every zeppelin instance eats memory for zeppelin itself. It's not enough
>> memory at some point.
>> Every spark driver(I use yarn client mode) eats memory. Same issue.
>> Single point of failure
>> Cores might be not enough
>> I can not prove it, but even if memory and cores enough, Zeppelin experience
>> problems when it's >10 zeppelin instances on one machine. Do not know for
>> which reason, maybe it's spark driver issues.
>> Our current approach:
>> Every department has it's own VM, it's own zeppelin in it.
>> I'm not Devops I do not have experience support multiple VM
>> It's expensive to have hardware for a lot of VM
>> Most of this hardware do not work even 20% of the time.
>>
>> How are you dealing with this situation?
>>
>>
>> --
>> Sincerely yours
>> Egor Pakhomov
>
>
>
>
> --
> Sincerely yours
> Egor Pakhomov