perhaps this is https://issues.apache.org/jira/browse/SPARK-24578?
that was reported as a performance issue, not OOMs, but its in the exact
same part of the code and the change was to reduce the memory pressure
significantly.
On Mon, Jul 16, 2018 at 1:43 PM, Bryan Jeffrey
wrote:
> Hello.
>
> I
Hello.
I am working to move our system from Spark 2.1.0 to Spark 2.3.0. Our
system is running on Spark managed via Yarn. During the course of the move
I mirrored the settings to our new cluster. However, on the Spark 2.3.0
cluster with the same resource allocation I am seeing a number of execut
AFAIK, I don't think the off-heap memory settings is enabled automatically,
there're two configurations control the tungsten off-heap memory usage:
1. spark.memory.offHeap.enabled.
2. spark.memory.offHeap.size.
On Sat, Apr 22, 2017 at 7:44 PM, geoHeil wrote:
> Hi,
> I wonde
Hi,
I wonder when to enable spark's off heap settings. Shouldn't tungsten enable
these automatically in 2.1?
http://stackoverflow.com/questions/43330902/spark-off-heap-memory-config-and-tungsten
Regards,
Georg
--
View this message in context:
http://apache-spark-user-list.
Hi,
I thought that with the integration of project Tungesten, spark would
automatically use off heap memory.
What for are spark.memory.offheap.size and spark.memory.offheap.enabled? Do
I manually need to specify the amount of off heap memory for Tungsten here?
Regards,
Georg
We have been measuring jvm heap memory usage in our spark app, by taking
periodic sampling of jvm heap memory usage and saving it in our metrics db.
we do this by spawning a thread in the spark app and measuring the jvm heap
memory usage every 1 min.
Is it a fair assumption to conclude that if the
> i agree that offheap memory usage is unpredictable.
>>
>> when we used rdds the memory was mostly on heap and total usage
>> predictable, and we almost never had yarn killing executors.
>>
>> now with dataframes the memory usage is both on and off heap, and we h
ors.
>
> now with dataframes the memory usage is both on and off heap, and we have
> no way of limiting the off heap memory usage by spark, yet yarn requires a
> maximum total memory usage and if you go over it yarn kills the executor.
>
> On Fri, Nov 25, 2016 at 12:14 PM, Anik
i agree that offheap memory usage is unpredictable.
when we used rdds the memory was mostly on heap and total usage
predictable, and we almost never had yarn killing executors.
now with dataframes the memory usage is both on and off heap, and we have
no way of limiting the off heap memory usage
igher spark.yarn.executor.memoryOverhead and lower executor memory
size. I had to trade off performance for reliability.
Unfortunately, spark does a poor job reporting off heap memory usage. From
the profiler, it seems that the job's heap usage is pretty static but the
off heap memory fluctuates quiet a lot. It looks li
gt;
Cc: user<mailto:user@spark.apache.org>
Subject: Re: OS killing Executor due to high (possibly off heap) memory usage
Try setting spark.yarn.executor.memoryOverhead 1
On Thu, Nov 24, 2016 at 11:16 AM, Aniket Bhatnagar
mailto:aniket.bhatna...@gmail.com>> wrote:
Hi Spark users
I am
; leads me to believe that something triggers out of memory during shuffle
> read. Is there a configuration to completely disable usage of off heap
> memory? I have tried setting spark.shuffle.io.preferDirectBufs=false but
> the executor is still getting killed by the same error.
>
> C
Dataset/dataframes will use direct/raw/off-heap memory in the most
efficient columnar fashion. Trying to fit the same amount of data in heap
memory would likely increase your memory requirement and decrease the
speed.
So, in short, don't worry about it and increase overhead. You can also
(org.apache.spark.network.server.TransportRequestHandler) logs a lot of
channel closed exceptions in yarn node manager logs. This leads me to
believe that something triggers out of memory during shuffle read. Is there
a configuration to completely disable usage of off heap memory? I have
tried setting spark.shuffle.io.preferDirectBufs=false
we are testing Dataset/Dataframe jobs instead of RDD jobs. one thing we
keep running into is containers getting killed by yarn. i realize this has
to do with off-heap memory, and the suggestion is to increase
spark.yarn.executor.memoryOverhead.
at times our memoryOverhead is as large as the
monitor
> it : I noticed that the heap memory is constantly increasing until the GC
> triggers, and then it restarts increasing again and so on.
>
> I tried to use a profiler to understand what is happening in the heap. All I
> found is a byte[] object that is constantly increasin
Hi,
I'm running a Spark Streaming job that pulls data from Kafka (using the
direct approach method - without receiver) and pushes it into
elasticsearch. The job is running fine but I was suprised once I opened
jconsole to monitor it : I noticed that the heap memory is constantly
increasing
Are you having this issue with spark 1.5 as well? We had similar OOM issue
and was told by databricks to upgrade to 1.5 to resolve that. I guess they
are trying to sell Tachyon :)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Off-heap-memory-usage-of
>From my experience with spark 1.3.1 you can also set
spark.executor.memoryOverhead to about 7-10% of your spark.executor.memory.
Total of which will be requested for a Yarn container.
On Tue, Jan 26, 2016 at 4:20 AM, Xiaoyu Ma
wrote:
> Hi all,
> I saw spark 1.6 has new off heap settings: spark.
Hi all,
I saw spark 1.6 has new off heap settings: spark.memory.offHeap.size
The doc said we need to shrink on heap size accordingly. But on Yarn on-heap
and yarn limit is set all together via spark.executor.memory (jvm opts for
memory is not allowed according to doc), how can we set executor JVM
Hi,
The off-heap memory usage of the 3 Spark executor processes keeps increasing
constantly until the boundaries of the physical RAM are hit. This happened two
weeks ago, at which point the system comes to a grinding halt, because it's
unable to spawn new processes. At such a moment resta
Hi,
The off-heap memory usage of the 3 Spark executor processes keeps increasing
constantly until the boundaries of the physical RAM are hit. This happened two
weeks ago, at which point the system comes to a grinding halt, because it's
unable to spawn new processes. At such a moment resta
ounes.nag...@tritondigital.com> wrote:
> Hi all,
>
>
>
> I’m running a spark shell: bin/spark-shell --executor-memory 32G
> --driver-memory 8G
>
> I keep getting :
>
> 15/10/30 13:41:59 WARN MemoryManager: Total allocation
> exceeds 95.00% (2,147,4
Hi all,
I'm running a spark shell: bin/spark-shell --executor-memory 32G
--driver-memory 8G
I keep getting :
15/10/30 13:41:59 WARN MemoryManager: Total allocation exceeds
95.00% (2,147,483,647 bytes) of heap memory
Any help ?
Thanks,
Younes Naguib
Triton Digital | 144
; gradually increasing the physical memory usage till a point where yarn
> container kill it. I have configured upto 192M Heap and 384 off heap space in
> my driver but it eventually runs out of it
>
> The Heap memory appears to be fine with regular GC cycles. There is no
> OutOffMemory enc
84 off heap space
> in my driver but it eventually runs out of it
>
> The Heap memory appears to be fine with regular GC cycles. There is no
> OutOffMemory encountered ever in any such runs
>
> Infact I am not generating any traffic on the kafka queues still this
> happens. Here i
a point where yarn container kill it.
I have configured upto 192M Heap and 384 off heap space in my driver but it
eventually runs out of it
The Heap memory appears to be fine with regular GC cycles. There is no
OutOffMemory encountered ever in any such runs
Infact I am not generating any
java.lang.ref.Finalizer
>>>>>> 19: 25725 617400 java.lang.String
>>>>>> 20: 320 570368
>>>>>> [Lio.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry;
>>>>>> 21: 16066
617400 java.lang.String
>>>>> 20: 320 570368
>>>>> [Lio.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry;
>>>>> 21: 16066 514112
>>>>> java.util.concurrent.ConcurrentHashMap$HashEntry
>>>&g
lThreadCache$MemoryRegionCache$Entry;
>>>> 21: 16066 514112
>>>> java.util.concurrent.ConcurrentHashMap$HashEntry
>>>> 22: 12288 491520
>>>> org.jboss.netty.util.internal.ConcurrentIdentityHashMap$Segment
>>>>
l.ConcurrentIdentityHashMap$Segment
>>> 23: 13343 426976
>>> java.util.concurrent.locks.ReentrantLock$NonfairSync
>>> 24: 12288 396416
>>> [Lorg.jboss.netty.util.internal.ConcurrentIdentityHashMap$HashEntry;
>>> 25:
rantLock$NonfairSync
>> 24: 12288 396416
>> [Lorg.jboss.netty.util.internal.ConcurrentIdentityHashMap$HashEntry;
>> 25: 16447 394728 java.util.zip.ZStreamRef
>> 26: 565 370080 [I
>> 27: 508
209232
> [Ljava.util.concurrent.ConcurrentHashMap$HashEntry;
> 30: 2524 192312 [Ljava.lang.Object;
>
> But as I mentioned above, the heap memory seems OK, the extra memory is
> consumed by some off-heap data. I can't find a way to figure out what is in
29: 771 209232
[Ljava.util.concurrent.ConcurrentHashMap$HashEntry;
30: 2524 192312 [Ljava.lang.Object;
But as I mentioned above, the heap memory seems OK, the extra memory is
consumed by some off-heap data. I can't find a way to figure out what is in
there.
t;> > wrote:
>>>>
>>>>> Are you using the createStream or createDirectStream api? If its the
>>>>> former, you can try setting the StorageLevel to MEMORY_AND_DISK (it might
>>>>> slow things down though). Another way would be to try the later
you can try setting the StorageLevel to MEMORY_AND_DISK (it might
>>>> slow things down though). Another way would be to try the later one.
>>>>
>>>> Thanks
>>>> Best Regards
>>>>
>>>> On Wed, May 27, 2015 at 1:00 PM, Ji ZHANG
ld be to try the later one.
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Wed, May 27, 2015 at 1:00 PM, Ji ZHANG wrote:
>>>
>>>> Hi Akhil,
>>>>
>>>> Thanks for your reply. Accoding to the Streaming tab of Web UI,
>
>>> Thanks for your reply. Accoding to the Streaming tab of Web UI, the
>>> Processing Time is around 400ms, and there's no Scheduling Delay, so I
>>> suppose it's not the Kafka messages that eat up the off-heap memory. Or
>>> maybe it is, but how to t
;
>> Hi Akhil,
>>
>> Thanks for your reply. Accoding to the Streaming tab of Web UI, the
>> Processing Time is around 400ms, and there's no Scheduling Delay, so I
>> suppose it's not the Kafka messages that eat up the off-heap memory. Or
>> maybe it is,
hil,
>
> Thanks for your reply. Accoding to the Streaming tab of Web UI, the
> Processing Time is around 400ms, and there's no Scheduling Delay, so I
> suppose it's not the Kafka messages that eat up the off-heap memory. Or
> maybe it is, but how to tell?
>
> I googl
Hi Akhil,
Thanks for your reply. Accoding to the Streaming tab of Web UI, the
Processing Time is around 400ms, and there's no Scheduling Delay, so I
suppose it's not the Kafka messages that eat up the off-heap memory. Or
maybe it is, but how to tell?
I googled about how to check th
After submitting the job, if you do a ps aux | grep spark-submit then you
can see all JVM params. Are you using the highlevel consumer (receiver
based) for receiving data from Kafka? In that case if your throughput is
high and the processing delay exceeds batch interval then you will hit this
memor
Hi,
I'm using Spark Streaming 1.3 on CDH5.1 with yarn-cluster mode. I find out
that YARN is killing the driver and executor process because of excessive
use of memory. Here's something I tried:
1. Xmx is set to 512M and the GC looks fine (one ygc per 10s), so the extra
memory is not used by heap.
Thank you Sean. Moving over my data types from Double to Float was an
(obvious) big win, and I discovered one more good optimization from the
Tuning section --
I modified my original code to call .persist(MEMORY_ONLY_SER) from the
FIRST import of the data, and I pass in "--conf spark.rdd.compress=
A CSV element like "3.2," takes 4 bytes as text on disk, but, as a
Double will always take 8 bytes. Is your input like this? that could
explain it.
You can map to Float in this case to halve the memory, if that works
for your use case. This is just kind of how Strings and floating-point
work in th
Hi guys,
I am trying just parse out values from a CSV, everything is a numeric
(Double) value, and the input text CSV data is about 1.3 GB in size.
When inspect the Java Heap space used by SparkSubmit using JVisualiser VM,
I end up eating up 8GB of memory! Moreover, by inspecting the BlockManager
t;>> SPARK_WORKER_CORES=1
>>> export SPARK_WORKER_CORES=1
>>> SPARK_WORKER_MEMORY=2g
>>> export SPARK_WORKER_MEMORY=2g
>>> SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.spark.executor.memory=4g"
>>> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.spark.executor.memory=4g"
>>>
>>> During runtime I receive a Java OutOfMemory exception and a Core dump.
>>> My dataset is less than 1 GB and I want to make sure that I cache it all in
>>> memory for my ML task.
>>>
>>> Am I increasing the JVM Heap Memory correctly? Am I doing something
>>> wrong?
>>>
>>> Thank you,
>>>
>>> Nick
>>>
>>>
>>
>
gt; SPARK_WORKER_CORES=1
>> export SPARK_WORKER_CORES=1
>> SPARK_WORKER_MEMORY=2g
>> export SPARK_WORKER_MEMORY=2g
>> SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.spark.executor.memory=4g"
>> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.spark.executor.memory=4g"
>
RK_DAEMON_JAVA_OPTS="-Dspark.deploy.spark.executor.memory=4g"
>
> During runtime I receive a Java OutOfMemory exception and a Core dump. My
> dataset is less than 1 GB and I want to make sure that I cache it all in
> memory for my ML task.
>
> Am I increasing the JVM Heap Memory correctly? Am I doing something wrong?
>
> Thank you,
>
> Nick
>
>
MORY=2g
SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.spark.executor.memory=4g"
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.spark.executor.memory=4g"
During runtime I receive a Java OutOfMemory exception and a Core dump. My
dataset is less than 1 GB and I want to make sure that I cache it all in
memory fo
50 matches
Mail list logo