Running in a VM we’ve noticed Zeppelin consume lot of memory and have encountered out-of-memory and GC issues - with a couple of users.
In our case, I attributed it to the use-case: Spark interpreter connecting to a Postgres DB to load few tables into data frames which had large number of rows We registered these DF's as temp tables (createOrReplaceTempView) while the notebook ran. When we ran joins across these temp tables, using spark sql, we ran out of memory. While we’re on AWS, we were not in a position to run/provision scalable clusters (using YARN etc.) and neither did we want to provision very large EC2 instances. To keep memory in check, we’ve changed our approach to use JDBC+Postgres to push down the predicates to Postgres and use Spark data frames for gathering the results (much smaller result sets) to interface with Zeppelin UI/pages for displaying the results etc. Effectively, we ended with dividing the work load - postgres provided the right balance of memory/disk/cpu for querying and spark interfaced well with zeppelin for analytics (esp across pages for drill down use case). It’d be great to hear experience's from others who use Zeppelin on a single VM/host or in _non_ clustered environments. > On Jan 24, 2017, at 12:26 PM, Paul Brenner <pbren...@placeiq.com> wrote: > > > We are using zeppelin with multiple users on the same server and often run > out of memory on the dedicated VM that zeppelin runs on. It isn’t uncommon to > run out of memory on the VM zeppelin is running on when just 3-4 users are > using zeppelin. Is this normal behavior? Does each user’s spark interpreter > usually consume 500mb - 1gb of memory on the vm? > > I thought I saw reports of companies using zeppelin with 10s or 100 users, is > there something we are doing wrong? > > The VM zeppelin runs on currently only has 3gb of ram so we can raise that > number a little, but I don’t see how we could ever get to 10 simultaneous > users. > > <http://www.placeiq.com/> <http://www.placeiq.com/> > <http://www.placeiq.com/> Paul Brenner > <https://twitter.com/placeiq> <https://twitter.com/placeiq> > <https://twitter.com/placeiq> <https://www.facebook.com/PlaceIQ> > <https://www.facebook.com/PlaceIQ> > <https://www.linkedin.com/company/placeiq> > <https://www.linkedin.com/company/placeiq> > DATA SCIENTIST > (217) 390-3033 > > > <http://www.placeiq.com/2015/05/26/placeiq-named-winner-of-prestigious-2015-oracle-data-cloud-activate-award/> > > <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/> > > <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/> > > <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/> > > <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/> > > <http://placeiq.com/2016/03/08/measuring-addressable-tv-campaigns-is-now-possible/> > > <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/> > > <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/> > > <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/> > > <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/> > > <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/> > > <http://pages.placeiq.com/Location-Data-Accuracy-Whitepaper-Download.html?utm_source=Signature&utm_medium=Email&utm_campaign=AccuracyWP> > > <http://placeiq.com/2016/08/03/placeiq-bolsters-location-intelligence-platform-with-mastercard-insights/> > > <http://placeiq.com/2016/10/26/the-making-of-a-location-data-industry-milestone/> > > <http://placeiq.com/2016/12/07/placeiq-introduces-landmark-a-groundbreaking-offering-that-delivers-access-to-the-highest-quality-location-data-for-insights-that-fuel-limitless-business-decisions/>