I need to build a chart for 10 days for all countries(200) for several products by some dimensions. I would need at least 4-6 gb per zeppelin for it.
2016-08-05 12:31 GMT-07:00 Mohit Jaggi <mohitja...@gmail.com>: > put your big results somewhere else not in Z’s memory? > > On Aug 5, 2016, at 12:26 PM, Egor Pahomov <pahomov.e...@gmail.com> wrote: > > - Use spark driver in “cluster mode” where driver runs on a worker instead >> of the node running Z > > > Even without driver Z is heavy process. You need a lot of RAM to keep big > results from job. And most of all - zeppelin 0.5.6 does not support cluster > mode and I'm not ready to move to 0.6. > > 2016-08-05 12:03 GMT-07:00 Mohit Jaggi <mohitja...@gmail.com>: > >> Egor, >> Running a scale out system like Spark with multiple users is always >> tricky. Operating systems are designed to let multiple users share a single >> machine. But for “big data” a single user requires the use of several >> machines which is the exact opposite. Having said that I would suggest the >> following: >> >> - Use spark driver in “cluster mode” where driver runs on a worker >> instead of the node running Z >> - Set appropriate limits/sizes in spark master configuration >> - run separate instances of Z per user, but then you will have a tough >> time collaborating and sharing notebooks…maybe they can be stored in a >> shared space and all Z instances can read them but I am afraid that shared >> access might clobber the files. Z developers can tell us if that is true >> >> Another alternative is virtualization using containers but I think that >> will not be easy either. >> >> Mohit >> Founder, >> Data Orchard LLC >> www.dataorchardllc.com >> >> >> On Aug 5, 2016, at 11:45 AM, Egor Pahomov <pahomov.e...@gmail.com> wrote: >> >> Hi, I'd like to discuss best practices for using zeppelin in the >> multi-user environment. There are several naive approaches, I've tried for >> at least couple month each and not a single one worked: >> >> *All users on one zeppelin.* >> >> - One spark context - people really break sc and when they are all in >> the same boat a single person can stop many from working. >> - No resource management support. One person can allocate all >> resources for a long time >> - The number of notebooks is enormous - it's hard to find anything in >> it. >> - No security separation - everyone sees everything. I do not care >> about security, but I care about fool prove. And people can accidently >> delete notebooks of each other. >> >> *Every user has his own Zeppelin on one machine* >> >> - Every zeppelin instance eats memory for zeppelin itself. It's not >> enough memory at some point. >> - Every spark driver(I use yarn client mode) eats memory. Same issue. >> - Single point of failure >> - Cores might be not enough >> - I can not prove it, but even if memory and cores enough, Zeppelin >> experience problems when it's >10 zeppelin instances on one machine. Do >> not >> know for which reason, maybe it's spark driver issues. >> >> Our current approach: >> *Every department has it's own VM, it's own zeppelin in it.* >> >> - I'm not Devops I do not have experience support multiple VM >> - It's expensive to have hardware for a lot of VM >> - Most of this hardware do not work even 20% of the time. >> >> >> How are you dealing with this situation? >> >> >> -- >> >> >> *Sincerely yoursEgor Pakhomov* >> >> >> > > > -- > > > *Sincerely yoursEgor Pakhomov* > > > -- *Sincerely yoursEgor Pakhomov*