Re: offheap memory allocation and memory leak bug

Till Rohrmann Mon, 20 Jun 2016 05:34:40 -0700

That would be great. Best you directly post the link to the JIRA issue.

Cheers,
Till


On Mon, Jun 20, 2016 at 12:55 PM, CPC <acha...@gmail.com> wrote:

> Hi Till,
>
> I saw jira issue. Do you want me to upload input dataset as well? If you
> want i can prepare a github repo if it would be more easier.
> On Jun 20, 2016 1:10 PM, "Till Rohrmann" <trohrm...@apache.org> wrote:
>
> > Hi,
> >
> > your observation sounds like a bug to me and we have to further
> investigate
> > it. I assume that you’re running a batch job, right? Could you maybe
> share
> > your complete configuration and the job to reproduce the problem with us?
> >
> > I think that your investigation that direct buffers are not properly
> freed
> > and garbage collected can be right. I will open a JIRA issue to further
> > investigate and solve the problem. Thanks for reporting :-)
> >
> > At the moment, one way to solve this problem is, as you’ve already
> stated,
> > to set taskmanager.memory.preallocate: true in your configuration. For
> > batch jobs, this should actually improve the runtime performance at the
> > cost of a slightly longer start-up time for your TaskManagers.
> >
> > Cheers,
> > Till
> > 
> >
> > On Sun, Jun 19, 2016 at 6:16 PM, CPC <acha...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I think i found some information regarding this behavior.  In jvm it is
> > > almost imposible to free allocated memory via
> ByteBuffer.allocateDirect.
> > > There is no explicit way to say jvm "free this direct bytebuffer". In
> > some
> > > forums they said you can free memory with below method:
> > >
> > >> def releaseBuffers(buffers:List[ByteBuffer]):List[ByteBuffer] = {
> > >>
> > >>     if(!buffers.isEmpty){
> > >>
> > >>         val cleanerMethod = buffers.head.getClass.getMethod("cleaner")
> > >>
> > >>         cleanerMethod.setAccessible(true)
> > >>
> > >>         buffers.foreach{buffer=>
> > >>
> > >>             val cleaner = cleanerMethod.invoke(buffer)
> > >>
> > >>             val cleanMethod = cleaner.getClass().getMethod("clean")
> > >>
> > >>             cleanMethod.setAccessible(true)
> > >>
> > >>             cleanMethod.invoke(cleaner)
> > >>
> > >>         }
> > >>
> > >>     }
> > >>
> > >>     List.empty[ByteBuffer]
> > >>
> > >> }
> > >>
> > >>
> > > but since cleaner method is an internal method ,above  is not
> recommended
> > > and not working in every jvm and java 9 does not support it also. I
> also
> > > made some tests with above method and behavior is not predictable. If
> > > memory allocated by some other thread and that thread exit then it
> > release
> > > memory. Actually GC controls directMemory buffers. If there is no gc
> > > activity and memory is allocated and then dereferenced by different
> > threads
> > > memory usage goes beyond intended and machine goes to swap then os
> kills
> > > taskmanager. In my tests i saw that behaviour:
> > >
> > > Suppose that thread A allocated 8gb memory exit and there is no
> reference
> > > to allocated memory
> > > than thread B allocated 8gb memory exit and there is no reference to
> > > allocated memory
> > >
> > > when i look at direct memory usage from jvisualvm it looks like
> > > below(-Xmx512m -XX:MaxDirectMemorySize=12G)
> > >
> > > [image: Inline images 1]
> > >
> > > but RSS of the process is 16 GB. If i call System.gc at that point RSS
> > > drops to 8GB but not to expected point.
> > >
> > > This is why Apache cassandra guys select sun.misc.Unsafe(
> > >
> >
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Off-heap-caching-through-ByteBuffer-allocateDirect-when-JNA-not-available-td6977711.html
> > > ).
> > >
> > > I think currently only way to limit memory usage in flink if you want
> to
> > > use same taskmanager across jobs is via
> "taskmanager.memory.preallocate:
> > > true". Since it allocate memory at the beginning and not freed its
> memory
> > > usage stays constant.
> > >
> > > PS: Sorry for my english i am not a native speaker. I hope i can
> explain
> > > what i intended to :)
> > >
> > >
> > >
> > > On 18 June 2016 at 16:36, CPC <acha...@gmail.com> wrote:
> > >
> > >> Hello,
> > >>
> > >> I repeated the same test with conf values.
> > >>
> > >>> taskmanager.heap.mb: 6500
> > >>>
> > >>> taskmanager.memory.off-heap: true
> > >>>
> > >>> taskmanager.memory.fraction: 0.9
> > >>>
> > >>>
> > >> i set TM_MAX_OFFHEAP_SIZE="6G" in taskmanager sh. Taskmanager started
> > >> with
> > >>
> > >>> capacman 14543  323 56.0 17014744 13731328 pts/1 Sl 16:23  35:25
> > >>> /home/capacman/programlama/java/jdk1.7.0_75/bin/java
> > >>> -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -Xms650M
> -Xmx650M
> > >>> -XX:MaxDirectMemorySize=6G -XX:MaxPermSize=256m
> > >>>
> >
> -Dlog.file=/home/capacman/Data/programlama/flink-1.0.3/log/flink-capacman-taskmanager-0-capacman-Aspire-V3-771.log
> > >>>
> >
> -Dlog4j.configuration=file:/home/capacman/Data/programlama/flink-1.0.3/conf/log4j.properties
> > >>>
> >
> -Dlogback.configurationFile=file:/home/capacman/Data/programlama/flink-1.0.3/conf/logback.xml
> > >>> -classpath
> > >>>
> >
> /home/capacman/Data/programlama/flink-1.0.3/lib/flink-dist_2.11-1.0.3.jar:/home/capacman/Data/programlama/flink-1.0.3/lib/flink-python_2.11-1.0.3.jar:/home/capacman/Data/programlama/flink-1.0.3/lib/log4j-1.2.17.jar:/home/capacman/Data/programlama/flink-1.0.3/lib/slf4j-log4j12-1.7.7.jar:::
> > >>> org.apache.flink.runtime.taskmanager.TaskManager --configDir
> > >>> /home/capacman/Data/programlama/flink-1.0.3/conf
> > >>>
> > >>
> > >> but memory usage reach up to 13Gb. Could somebodey explain me why
> memory
> > >> usage is so high? I expect it to be at most 8GB with some jvm internal
> > >> overhead.
> > >>
> > >> [image: Inline images 1]
> > >>
> > >> [image: Inline images 2]
> > >>
> > >> On 17 June 2016 at 20:26, CPC <acha...@gmail.com> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> I am making some test about offheap memory usage and encounter an odd
> > >>> behavior. My taskmanager heap limit is 12288 Mb and when i set
> > >>> "taskmanager.memory.off-hep:true" for every job it allocates 11673 Mb
> > off
> > >>> heap area at most which is heapsize*0.95(value of
> > >>> taskmanager.memory.fraction). But when i submit second job it
> allocated
> > >>> another 11GB and does not free memory since MaxDirectMemorySize set
> to
> > >>>  -XX:MaxDirectMemorySize=${TM_MAX_OFFHEAP_SIZE}"  which is
> > >>> TM_MAX_OFFHEAP_SIZE="8388607T" and my laptop goes to swap then kernel
> > oom
> > >>> killed taskmanager. If i hit perform gc from visualvm between jobs
> > then it
> > >>> release direct memory but memory usage of taskmanager in ps command
> is
> > >>> still around 20GB(RSS) and 27GB(virtual size)  in that case i could
> > submit
> > >>> my test job a few times without oom killed task manager but after 10
> > submit
> > >>>  it killed taskmanager again.  I dont understand why jvm memory usage
> > is
> > >>> still high even if all direct memory released. Do you have any idea?
> > Then
> > >>>  i set MaxDirectMemorySize to 12 GB  in this case it freed direct
> > memory
> > >>> without any explicit gc triggering from visualvm but jvm process
> memory
> > >>> usage was still high around 20GB(RSS) and 27GB(virtual size). After
> > again
> > >>> maybe 10 submit it killed taskmanager. I think this is a bug and make
> > it
> > >>> imposible to reuse taskmanagers without restarting them in standalone
> > mode.
> > >>>
> > >>> [image: Inline images 1]
> > >>>
> > >>> [image: Inline images 2]
> > >>>
> > >>
> > >>
> > >
> >
>

Re: offheap memory allocation and memory leak bug

Reply via email to