Running in a VM we’ve noticed Zeppelin consume lot of memory and have 
encountered out-of-memory and GC issues - with a couple of users.

In our case, I attributed it to the use-case: Spark interpreter connecting to a 
Postgres DB to load few tables into data frames which had large number of rows 
We registered these DF's as temp tables (createOrReplaceTempView) while the 
notebook ran. When we ran joins across these temp tables, using spark sql, we 
ran out of memory.

While we’re on AWS, we were not in a position to run/provision scalable 
clusters (using YARN etc.) and neither did we want to provision very large EC2 
instances.

To keep memory in check, we’ve changed our approach to use JDBC+Postgres to 
push down the predicates to Postgres and use Spark data frames for gathering 
the results (much smaller result sets) to interface with Zeppelin UI/pages for 
displaying the results etc. Effectively, we ended with dividing the work load - 
postgres provided the right balance of memory/disk/cpu for querying and spark 
interfaced well with zeppelin for analytics (esp across pages for drill down 
use case).

It’d be great to hear experience's from others who use Zeppelin on a single 
VM/host or in _non_ clustered environments.



> On Jan 24, 2017, at 12:26 PM, Paul Brenner <pbren...@placeiq.com> wrote:
> 
> 
> We are using zeppelin with multiple users on the same server and often run 
> out of memory on the dedicated VM that zeppelin runs on. It isn’t uncommon to 
> run out of memory on the VM zeppelin is running on when just 3-4 users are 
> using zeppelin. Is this normal behavior? Does each user’s spark interpreter 
> usually consume 500mb - 1gb of memory on the vm?
> 
> I thought I saw reports of companies using zeppelin with 10s or 100 users, is 
> there something we are doing wrong?
> 
> The VM zeppelin runs on currently only has 3gb of ram so we can raise that 
> number a little, but I don’t see how we could ever get to 10 simultaneous 
> users.
> 
>  <http://www.placeiq.com/> <http://www.placeiq.com/> 
> <http://www.placeiq.com/>        Paul Brenner     
> <https://twitter.com/placeiq> <https://twitter.com/placeiq> 
> <https://twitter.com/placeiq>       <https://www.facebook.com/PlaceIQ> 
> <https://www.facebook.com/PlaceIQ>   
> <https://www.linkedin.com/company/placeiq> 
> <https://www.linkedin.com/company/placeiq>
> DATA SCIENTIST
> (217) 390-3033  
> 
>  
> <http://www.placeiq.com/2015/05/26/placeiq-named-winner-of-prestigious-2015-oracle-data-cloud-activate-award/>
>  
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
>  
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
>  
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
>  
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
>  
> <http://placeiq.com/2016/03/08/measuring-addressable-tv-campaigns-is-now-possible/>
>  
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>  
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>  
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>  
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>  
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>  
> <http://pages.placeiq.com/Location-Data-Accuracy-Whitepaper-Download.html?utm_source=Signature&utm_medium=Email&utm_campaign=AccuracyWP>
>  
> <http://placeiq.com/2016/08/03/placeiq-bolsters-location-intelligence-platform-with-mastercard-insights/>
>  
> <http://placeiq.com/2016/10/26/the-making-of-a-location-data-industry-milestone/>
>  
> <http://placeiq.com/2016/12/07/placeiq-introduces-landmark-a-groundbreaking-offering-that-delivers-access-to-the-highest-quality-location-data-for-insights-that-fuel-limitless-business-decisions/>

Reply via email to