Re: Heap Memory in Spark 2.3.0

2018-07-17 Thread Imran Rashid
perhaps this is https://issues.apache.org/jira/browse/SPARK-24578? that was reported as a performance issue, not OOMs, but its in the exact same part of the code and the change was to reduce the memory pressure significantly. On Mon, Jul 16, 2018 at 1:43 PM, Bryan Jeffrey wrote: > Hello. > > I

Heap Memory in Spark 2.3.0

2018-07-16 Thread Bryan Jeffrey
Hello. I am working to move our system from Spark 2.1.0 to Spark 2.3.0. Our system is running on Spark managed via Yarn. During the course of the move I mirrored the settings to our new cluster. However, on the Spark 2.3.0 cluster with the same resource allocation I am seeing a number of execut

Re: Off heap memory settings and Tungsten

2017-04-24 Thread Saisai Shao
AFAIK, I don't think the off-heap memory settings is enabled automatically, there're two configurations control the tungsten off-heap memory usage: 1. spark.memory.offHeap.enabled. 2. spark.memory.offHeap.size. On Sat, Apr 22, 2017 at 7:44 PM, geoHeil wrote: > Hi, > I wonde

Off heap memory settings and Tungsten

2017-04-22 Thread geoHeil
Hi, I wonder when to enable spark's off heap settings. Shouldn't tungsten enable these automatically in 2.1? http://stackoverflow.com/questions/43330902/spark-off-heap-memory-config-and-tungsten Regards, Georg -- View this message in context: http://apache-spark-user-list.

spark off heap memory

2017-04-09 Thread Georg Heiler
Hi, I thought that with the integration of project Tungesten, spark would automatically use off heap memory. What for are spark.memory.offheap.size and spark.memory.offheap.enabled? Do I manually need to specify the amount of off heap memory for Tungsten here? Regards, Georg

Spark executor memory and jvm heap memory usage metric

2017-02-15 Thread satishl
We have been measuring jvm heap memory usage in our spark app, by taking periodic sampling of jvm heap memory usage and saving it in our metrics db. we do this by spawning a thread in the spark app and measuring the jvm heap memory usage every 1 min. Is it a fair assumption to conclude that if the

Re: OS killing Executor due to high (possibly off heap) memory usage

2017-01-03 Thread Koert Kuipers
> i agree that offheap memory usage is unpredictable. >> >> when we used rdds the memory was mostly on heap and total usage >> predictable, and we almost never had yarn killing executors. >> >> now with dataframes the memory usage is both on and off heap, and we h

Re: OS killing Executor due to high (possibly off heap) memory usage

2016-12-08 Thread Aniket Bhatnagar
ors. > > now with dataframes the memory usage is both on and off heap, and we have > no way of limiting the off heap memory usage by spark, yet yarn requires a > maximum total memory usage and if you go over it yarn kills the executor. > > On Fri, Nov 25, 2016 at 12:14 PM, Anik

Re: OS killing Executor due to high (possibly off heap) memory usage

2016-11-26 Thread Koert Kuipers
i agree that offheap memory usage is unpredictable. when we used rdds the memory was mostly on heap and total usage predictable, and we almost never had yarn killing executors. now with dataframes the memory usage is both on and off heap, and we have no way of limiting the off heap memory usage

Re: OS killing Executor due to high (possibly off heap) memory usage

2016-11-25 Thread Aniket Bhatnagar
igher spark.yarn.executor.memoryOverhead and lower executor memory size. I had to trade off performance for reliability. Unfortunately, spark does a poor job reporting off heap memory usage. From the profiler, it seems that the job's heap usage is pretty static but the off heap memory fluctuates quiet a lot. It looks li

RE: OS killing Executor due to high (possibly off heap) memory usage

2016-11-24 Thread Shreya Agarwal
gt; Cc: user<mailto:user@spark.apache.org> Subject: Re: OS killing Executor due to high (possibly off heap) memory usage Try setting spark.yarn.executor.memoryOverhead 1 On Thu, Nov 24, 2016 at 11:16 AM, Aniket Bhatnagar mailto:aniket.bhatna...@gmail.com>> wrote: Hi Spark users I am

Re: OS killing Executor due to high (possibly off heap) memory usage

2016-11-24 Thread Rodrick Brown
; leads me to believe that something triggers out of memory during shuffle > read. Is there a configuration to completely disable usage of off heap > memory? I have tried setting spark.shuffle.io.preferDirectBufs=false but > the executor is still getting killed by the same error. > > C

Re: spark sql jobs heap memory

2016-11-24 Thread Rohit Karlupia
Dataset/dataframes will use direct/raw/off-heap memory in the most efficient columnar fashion. Trying to fit the same amount of data in heap memory would likely increase your memory requirement and decrease the speed. So, in short, don't worry about it and increase overhead. You can also

OS killing Executor due to high (possibly off heap) memory usage

2016-11-24 Thread Aniket Bhatnagar
(org.apache.spark.network.server.TransportRequestHandler) logs a lot of channel closed exceptions in yarn node manager logs. This leads me to believe that something triggers out of memory during shuffle read. Is there a configuration to completely disable usage of off heap memory? I have tried setting spark.shuffle.io.preferDirectBufs=false

spark sql jobs heap memory

2016-11-23 Thread Koert Kuipers
we are testing Dataset/Dataframe jobs instead of RDD jobs. one thing we keep running into is containers getting killed by yarn. i realize this has to do with off-heap memory, and the suggestion is to increase spark.yarn.executor.memoryOverhead. at times our memoryOverhead is as large as the

Re: Constantly increasing Spark streaming heap memory

2016-02-22 Thread Robin East
monitor > it : I noticed that the heap memory is constantly increasing until the GC > triggers, and then it restarts increasing again and so on. > > I tried to use a profiler to understand what is happening in the heap. All I > found is a byte[] object that is constantly increasin

Constantly increasing Spark streaming heap memory

2016-02-20 Thread Walid LEZZAR
Hi, I'm running a Spark Streaming job that pulls data from Kafka (using the direct approach method - without receiver) and pushes it into elasticsearch. The job is running fine but I was suprised once I opened jconsole to monitor it : I noticed that the heap memory is constantly increasing

Re: Off-heap memory usage of Spark Executors keeps increasing

2016-01-26 Thread nir
Are you having this issue with spark 1.5 as well? We had similar OOM issue and was told by databricks to upgrade to 1.5 to resolve that. I guess they are trying to sell Tachyon :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Off-heap-memory-usage-of

Re: Regarding Off-heap memory

2016-01-26 Thread Nirav Patel
>From my experience with spark 1.3.1 you can also set spark.executor.memoryOverhead to about 7-10% of your spark.executor.memory. Total of which will be requested for a Yarn container. On Tue, Jan 26, 2016 at 4:20 AM, Xiaoyu Ma wrote: > Hi all, > I saw spark 1.6 has new off heap settings: spark.

Regarding Off-heap memory

2016-01-26 Thread Xiaoyu Ma
Hi all, I saw spark 1.6 has new off heap settings: spark.memory.offHeap.size The doc said we need to shrink on heap size accordingly. But on Yarn on-heap and yarn limit is set all together via spark.executor.memory (jvm opts for memory is not allowed according to doc), how can we set executor JVM

Off-heap memory usage of Spark Executors keeps increasing

2015-11-17 Thread b.schopman
Hi, The off-heap memory usage of the 3 Spark executor processes keeps increasing constantly until the boundaries of the physical RAM are hit. This happened two weeks ago, at which point the system comes to a grinding halt, because it's unable to spawn new processes. At such a moment resta

Spark Executors off-heap memory usage keeps increasing

2015-11-13 Thread Balthasar Schopman
Hi, The off-heap memory usage of the 3 Spark executor processes keeps increasing constantly until the boundaries of the physical RAM are hit. This happened two weeks ago, at which point the system comes to a grinding halt, because it's unable to spawn new processes. At such a moment resta

Re: heap memory

2015-11-09 Thread Akhil Das
ounes.nag...@tritondigital.com> wrote: > Hi all, > > > > I’m running a spark shell: bin/spark-shell --executor-memory 32G > --driver-memory 8G > > I keep getting : > > 15/10/30 13:41:59 WARN MemoryManager: Total allocation > exceeds 95.00% (2,147,4

heap memory

2015-10-30 Thread Younes Naguib
Hi all, I'm running a spark shell: bin/spark-shell --executor-memory 32G --driver-memory 8G I keep getting : 15/10/30 13:41:59 WARN MemoryManager: Total allocation exceeds 95.00% (2,147,483,647 bytes) of heap memory Any help ? Thanks, Younes Naguib Triton Digital | 144

Re: Spark off heap memory leak on Yarn with Kafka direct stream

2015-07-13 Thread Apoorva Sareen
; gradually increasing the physical memory usage till a point where yarn > container kill it. I have configured upto 192M Heap and 384 off heap space in > my driver but it eventually runs out of it > > The Heap memory appears to be fine with regular GC cycles. There is no > OutOffMemory enc

Re: Spark off heap memory leak on Yarn with Kafka direct stream

2015-07-13 Thread Cody Koeninger
84 off heap space > in my driver but it eventually runs out of it > > The Heap memory appears to be fine with regular GC cycles. There is no > OutOffMemory encountered ever in any such runs > > Infact I am not generating any traffic on the kafka queues still this > happens. Here i

Spark off heap memory leak on Yarn with Kafka direct stream

2015-07-13 Thread Apoorva Sareen
a point where yarn container kill it. I have configured upto 192M Heap and 384 off heap space in my driver but it eventually runs out of it The Heap memory appears to be fine with regular GC cycles. There is no OutOffMemory encountered ever in any such runs Infact I am not generating any

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-06-18 Thread Tathagata Das
java.lang.ref.Finalizer >>>>>> 19: 25725 617400 java.lang.String >>>>>> 20: 320 570368 >>>>>> [Lio.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry; >>>>>> 21: 16066

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-06-18 Thread Ji ZHANG
617400 java.lang.String >>>>> 20: 320 570368 >>>>> [Lio.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry; >>>>> 21: 16066 514112 >>>>> java.util.concurrent.ConcurrentHashMap$HashEntry >>>&g

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-06-04 Thread Ji ZHANG
lThreadCache$MemoryRegionCache$Entry; >>>> 21: 16066 514112 >>>> java.util.concurrent.ConcurrentHashMap$HashEntry >>>> 22: 12288 491520 >>>> org.jboss.netty.util.internal.ConcurrentIdentityHashMap$Segment >>>>

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-06-02 Thread Shixiong Zhu
l.ConcurrentIdentityHashMap$Segment >>> 23: 13343 426976 >>> java.util.concurrent.locks.ReentrantLock$NonfairSync >>> 24: 12288 396416 >>> [Lorg.jboss.netty.util.internal.ConcurrentIdentityHashMap$HashEntry; >>> 25:

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-06-02 Thread Ji ZHANG
rantLock$NonfairSync >> 24: 12288 396416 >> [Lorg.jboss.netty.util.internal.ConcurrentIdentityHashMap$HashEntry; >> 25: 16447 394728 java.util.zip.ZStreamRef >> 26: 565 370080 [I >> 27: 508

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-06-02 Thread Tathagata Das
209232 > [Ljava.util.concurrent.ConcurrentHashMap$HashEntry; > 30: 2524 192312 [Ljava.lang.Object; > > But as I mentioned above, the heap memory seems OK, the extra memory is > consumed by some off-heap data. I can't find a way to figure out what is in

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-06-02 Thread Ji ZHANG
29: 771 209232 [Ljava.util.concurrent.ConcurrentHashMap$HashEntry; 30: 2524 192312 [Ljava.lang.Object; But as I mentioned above, the heap memory seems OK, the extra memory is consumed by some off-heap data. I can't find a way to figure out what is in there.

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-28 Thread Ji ZHANG
t;> > wrote: >>>> >>>>> Are you using the createStream or createDirectStream api? If its the >>>>> former, you can try setting the StorageLevel to MEMORY_AND_DISK (it might >>>>> slow things down though). Another way would be to try the later

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-28 Thread Akhil Das
you can try setting the StorageLevel to MEMORY_AND_DISK (it might >>>> slow things down though). Another way would be to try the later one. >>>> >>>> Thanks >>>> Best Regards >>>> >>>> On Wed, May 27, 2015 at 1:00 PM, Ji ZHANG

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-28 Thread Ji ZHANG
ld be to try the later one. >>> >>> Thanks >>> Best Regards >>> >>> On Wed, May 27, 2015 at 1:00 PM, Ji ZHANG wrote: >>> >>>> Hi Akhil, >>>> >>>> Thanks for your reply. Accoding to the Streaming tab of Web UI,

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-28 Thread Akhil Das
> >>> Thanks for your reply. Accoding to the Streaming tab of Web UI, the >>> Processing Time is around 400ms, and there's no Scheduling Delay, so I >>> suppose it's not the Kafka messages that eat up the off-heap memory. Or >>> maybe it is, but how to t

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-27 Thread Ji ZHANG
; >> Hi Akhil, >> >> Thanks for your reply. Accoding to the Streaming tab of Web UI, the >> Processing Time is around 400ms, and there's no Scheduling Delay, so I >> suppose it's not the Kafka messages that eat up the off-heap memory. Or >> maybe it is,

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-27 Thread Akhil Das
hil, > > Thanks for your reply. Accoding to the Streaming tab of Web UI, the > Processing Time is around 400ms, and there's no Scheduling Delay, so I > suppose it's not the Kafka messages that eat up the off-heap memory. Or > maybe it is, but how to tell? > > I googl

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-27 Thread Ji ZHANG
Hi Akhil, Thanks for your reply. Accoding to the Streaming tab of Web UI, the Processing Time is around 400ms, and there's no Scheduling Delay, so I suppose it's not the Kafka messages that eat up the off-heap memory. Or maybe it is, but how to tell? I googled about how to check th

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-27 Thread Akhil Das
After submitting the job, if you do a ps aux | grep spark-submit then you can see all JVM params. Are you using the highlevel consumer (receiver based) for receiving data from Kafka? In that case if your throughput is high and the processing delay exceeds batch interval then you will hit this memor

Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-26 Thread Ji ZHANG
Hi, I'm using Spark Streaming 1.3 on CDH5.1 with yarn-cluster mode. I find out that YARN is killing the driver and executor process because of excessive use of memory. Here's something I tried: 1. Xmx is set to 512M and the GC looks fine (one ygc per 10s), so the extra memory is not used by heap.

Re: Why is parsing a CSV incredibly wasteful with Java Heap memory?

2014-10-13 Thread Aris
Thank you Sean. Moving over my data types from Double to Float was an (obvious) big win, and I discovered one more good optimization from the Tuning section -- I modified my original code to call .persist(MEMORY_ONLY_SER) from the FIRST import of the data, and I pass in "--conf spark.rdd.compress=

Re: Why is parsing a CSV incredibly wasteful with Java Heap memory?

2014-10-13 Thread Sean Owen
A CSV element like "3.2," takes 4 bytes as text on disk, but, as a Double will always take 8 bytes. Is your input like this? that could explain it. You can map to Float in this case to halve the memory, if that works for your use case. This is just kind of how Strings and floating-point work in th

Why is parsing a CSV incredibly wasteful with Java Heap memory?

2014-10-13 Thread Aris
Hi guys, I am trying just parse out values from a CSV, everything is a numeric (Double) value, and the input text CSV data is about 1.3 GB in size. When inspect the Java Heap space used by SparkSubmit using JVisualiser VM, I end up eating up 8GB of memory! Moreover, by inspecting the BlockManager

Re: Give more Java Heap Memory on Standalone mode

2014-07-21 Thread Andrew Or
t;>> SPARK_WORKER_CORES=1 >>> export SPARK_WORKER_CORES=1 >>> SPARK_WORKER_MEMORY=2g >>> export SPARK_WORKER_MEMORY=2g >>> SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.spark.executor.memory=4g" >>> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.spark.executor.memory=4g" >>> >>> During runtime I receive a Java OutOfMemory exception and a Core dump. >>> My dataset is less than 1 GB and I want to make sure that I cache it all in >>> memory for my ML task. >>> >>> Am I increasing the JVM Heap Memory correctly? Am I doing something >>> wrong? >>> >>> Thank you, >>> >>> Nick >>> >>> >> >

Re: Give more Java Heap Memory on Standalone mode

2014-07-21 Thread Nick R. Katsipoulakis
gt; SPARK_WORKER_CORES=1 >> export SPARK_WORKER_CORES=1 >> SPARK_WORKER_MEMORY=2g >> export SPARK_WORKER_MEMORY=2g >> SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.spark.executor.memory=4g" >> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.spark.executor.memory=4g" >

Re: Give more Java Heap Memory on Standalone mode

2014-07-21 Thread Abel Coronado Iruegas
RK_DAEMON_JAVA_OPTS="-Dspark.deploy.spark.executor.memory=4g" > > During runtime I receive a Java OutOfMemory exception and a Core dump. My > dataset is less than 1 GB and I want to make sure that I cache it all in > memory for my ML task. > > Am I increasing the JVM Heap Memory correctly? Am I doing something wrong? > > Thank you, > > Nick > >

Give more Java Heap Memory on Standalone mode

2014-07-21 Thread Nick R. Katsipoulakis
MORY=2g SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.spark.executor.memory=4g" export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.spark.executor.memory=4g" During runtime I receive a Java OutOfMemory exception and a Core dump. My dataset is less than 1 GB and I want to make sure that I cache it all in memory fo