It looks like one of our big problems is that zeppelin doesn’t always kill all 
completed processes. Is there an accepted way to kill a spark instance? Most of 
the time executing sys.exit in a paragraph will kill the spark instance in yarn 
and I believe also kill the corresponding zeppelin process… but not 100% of the 
time.

Similar with restarting the interpreter… most of the time kills the spark 
instance in yarn and kills the corresponding zeppelin process but not always.

Currently we have to check yarn and use “yarn application -kill” on anything 
that doesn’t get cleaned up properly there. We also have to login to the 
zeppelin vm and manually hunt down old processes that are still running days 
after their interpreters were stopped.

http://www.placeiq.com/ http://www.placeiq.com/ http://www.placeiq.com/

Paul Brenner

https://twitter.com/placeiq https://twitter.com/placeiq 
https://twitter.com/placeiq
https://www.facebook.com/PlaceIQ https://www.facebook.com/PlaceIQ
https://www.linkedin.com/company/placeiq 
https://www.linkedin.com/company/placeiq

DATA SCIENTIST

(217) 390-3033 

 

http://www.placeiq.com/2015/05/26/placeiq-named-winner-of-prestigious-2015-oracle-data-cloud-activate-award/
 
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
 
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
 
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
 
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
 
http://placeiq.com/2016/03/08/measuring-addressable-tv-campaigns-is-now-possible/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://pages.placeiq.com/Location-Data-Accuracy-Whitepaper-Download.html?utm_source=Signature&utm_medium=Email&utm_campaign=AccuracyWP
 
http://placeiq.com/2016/08/03/placeiq-bolsters-location-intelligence-platform-with-mastercard-insights/
 
http://placeiq.com/2016/10/26/the-making-of-a-location-data-industry-milestone/ 
http://placeiq.com/2016/12/07/placeiq-introduces-landmark-a-groundbreaking-offering-that-delivers-access-to-the-highest-quality-location-data-for-insights-that-fuel-limitless-business-decisions/

On Tue, Jan 24, 2017 at 12:53 PM t p

<
mailto:t p <tauis2...@gmail.com>
> wrote:

a, pre, code, a:link, body { word-wrap: break-word !important; }

Running in a VM we’ve noticed Zeppelin consume lot of memory and have 
encountered out-of-memory and GC issues - with a couple of users.

In our case, I attributed it to the use-case: Spark interpreter connecting to a 
Postgres DB to load few tables into data frames which had large number of rows 
We registered these DF's as temp tables (createOrReplaceTempView) while the 
notebook ran. When we ran joins across these temp tables, using spark sql, we 
ran out of memory.

While we’re on AWS, we were not in a position to run/provision scalable 
clusters (using YARN etc.) and neither did we want to provision very large EC2 
instances.

To keep memory in check, we’ve changed our approach to use JDBC+Postgres to 
push down the predicates to Postgres and use Spark data frames for gathering 
the results (much smaller result sets) to interface with Zeppelin UI/pages for 
displaying the results etc. Effectively, we ended with dividing the work load - 
postgres provided the right balance of memory/disk/cpu for querying and spark 
interfaced well with zeppelin for analytics (esp across pages for drill down 
use case).

It’d be great to hear experience's from others who use Zeppelin on a single 
VM/host or in _non_ clustered environments.

On Jan 24, 2017, at 12:26 PM, Paul Brenner <
mailto:pbren...@placeiq.com
> wrote:

We are using zeppelin with multiple users on the same server and often run out 
of memory on the dedicated VM that zeppelin runs on. It isn’t uncommon to run 
out of memory on the VM zeppelin is running on when just 3-4 users are using 
zeppelin. Is this normal behavior? Does each user’s spark interpreter usually 
consume 500mb - 1gb of memory on the vm?

I thought I saw reports of companies using zeppelin with 10s or 100 users, is 
there something we are doing wrong?

The VM zeppelin runs on currently only has 3gb of ram so we can raise that 
number a little, but I don’t see how we could ever get to 10 simultaneous users.

http://www.placeiq.com/ http://www.placeiq.com/ http://www.placeiq.com/

Paul Brenner

https://twitter.com/placeiq https://twitter.com/placeiq 
https://twitter.com/placeiq
https://www.facebook.com/PlaceIQ https://www.facebook.com/PlaceIQ
https://www.linkedin.com/company/placeiq 
https://www.linkedin.com/company/placeiq

DATA SCIENTIST

(217) 390-3033 

 

http://www.placeiq.com/2015/05/26/placeiq-named-winner-of-prestigious-2015-oracle-data-cloud-activate-award/
 
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
 
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
 
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
 
http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/
 
http://placeiq.com/2016/03/08/measuring-addressable-tv-campaigns-is-now-possible/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/
 
http://pages.placeiq.com/Location-Data-Accuracy-Whitepaper-Download.html?utm_source=Signature&utm_medium=Email&utm_campaign=AccuracyWP
 
http://placeiq.com/2016/08/03/placeiq-bolsters-location-intelligence-platform-with-mastercard-insights/
 
http://placeiq.com/2016/10/26/the-making-of-a-location-data-industry-milestone/ 
http://placeiq.com/2016/12/07/placeiq-introduces-landmark-a-groundbreaking-offering-that-delivers-access-to-the-highest-quality-location-data-for-insights-that-fuel-limitless-business-decisions/

Reply via email to