Hi

Thnaks for the reply. We Will try it out and let everybody know

Med venlig hilsen / Best regards
Lasse Nedergaard


> Den 20. apr. 2020 kl. 08.26 skrev Xintong Song <tonysong...@gmail.com>:
> 
> 
> Hi Lasse,
> 
> From what I understand, your problem is that JVM tries to fork some native 
> process (if you look at the exception stack the root exception is thrown from 
> a native method) but there's no enough memory for doing that. This could 
> happen when either Mesos is using cgroup strict mode for memory control, or 
> there's no more memory on the machine. Flink cannot prevent native processes 
> from using more memory. It can only reserve certain amount of memory for such 
> native usage when requesting worker memory from the deployment environment 
> (in your case Mesos) and allocating Java heap / direct memory.
> 
> My suggestion is to try increasing the JVM overhead configuration. You can 
> leverage the configuration options 
> 'taskmanager.memory.jvm-overhead.[min|max|fraction]'. See more details in the 
> documentation[1].
> 
> Thank you~
> Xintong Song
> 
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/config.html#taskmanager-memory-jvm-overhead-max
> 
>> On Sat, Apr 18, 2020 at 4:02 AM Zahid Rahman <zahidr1...@gmail.com> wrote:
>> https://betsol.com/java-memory-management-for-java-virtual-machine-jvm/
>> 
>> Backbutton.co.uk
>> ¯\_(ツ)_/¯ 
>> ♡۶Java♡۶RMI ♡۶
>> Make Use Method {MUM}
>> makeuse.org
>> 
>> 
>>> On Fri, 17 Apr 2020 at 14:07, Lasse Nedergaard 
>>> <lassenedergaardfl...@gmail.com> wrote:
>>> Hi.
>>> 
>>> We have migrated to Flink 1.10 and face out of memory exception and hopeful 
>>> can someone point us in the right direction. 
>>> 
>>> We have a job that use broadcast state, and we sometimes get out memory 
>>> when it creates a savepoint. See stacktrack below. 
>>> We have assigned 2.2 GB/task manager and configured  
>>> taskmanager.memory.process.size : 2200m
>>> In Flink 1.9 our container was terminated because OOM, so 1.10 do a better 
>>> job, but it still not working and the task manager is leaking mem for each 
>>> OOM and finial kill by Mesos
>>> 
>>> 
>>> Any idea what we can do to figure out what settings we need to change?
>>> 
>>> Thanks in advance
>>> 
>>> Lasse Nedergaard 
>>> 
>>> 
>>> WARN  o.a.flink.runtime.state.filesystem.FsCheckpointStreamFactory  - Could 
>>> not close the state stream for 
>>> s3://flinkstate/dcos-prod/checkpoints/fc9318cc236d09f0bfd994f138896d6c/chk-3509/cf0714dc-ad7c-4946-b44c-96d4a131a4fa.
>>> java.io.IOException: Cannot allocate memory
>>>     at java.io.FileOutputStream.writeBytes(Native Method)
>>>     at java.io.FileOutputStream.write(FileOutputStream.java:326)
>>>     at 
>>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>>>     at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>>>     at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
>>>     at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
>>>     at 
>>> com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3OutputStream.close(PrestoS3FileSystem.java:995)
>>>     at 
>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>>>     at 
>>> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>>>     at 
>>> org.apache.flink.fs.s3presto.common.HadoopDataOutputStream.close(HadoopDataOutputStream.java:52)
>>>     at 
>>> org.apache.flink.core.fs.ClosingFSDataOutputStream.close(ClosingFSDataOutputStream.java:64)
>>>     at 
>>> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.close(FsCheckpointStreamFactory.java:277)
>>>     at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:263)
>>>     at org.apache.flink.util.IOUtils.closeAllQuietly(IOUtils.java:250)
>>>     at 
>>> org.apache.flink.util.AbstractCloseableRegistry.close(AbstractCloseableRegistry.java:122)
>>>     at 
>>> org.apache.flink.runtime.state.AsyncSnapshotCallable.closeSnapshotIO(AsyncSnapshotCallable.java:167)
>>>     at 
>>> org.apache.flink.runtime.state.AsyncSnapshotCallable.call(AsyncSnapshotCallable.java:83)
>>>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>     at 
>>> org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:458)
>>>     at 
>>> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:53)
>>>     at 
>>> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:1143)
>>>     at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>     at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>     at java.lang.Thread.run(Thread.java:748)
>>> 
>>> INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - 
>>> Discarding checkpoint 3509 of job fc9318cc236d09f0bfd994f138896d6c.
>>> org.apache.flink.util.SerializedThrowable: Could not materialize checkpoint 
>>> 3509 for operator Feature extraction (8/12).
>>>     at 
>>> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:1238)
>>>     at 
>>> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:1180)
>>>     at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>     at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>     at java.lang.Thread.run(Thread.java:748)
>>> Caused by: org.apache.flink.util.SerializedThrowable: java.io.IOException: 
>>> Cannot allocate memory
>>>     at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>>>     at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>>>     at 
>>> org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:461)
>>>     at 
>>> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:53)
>>>     at 
>>> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:1143)
>>>     ... 3 common frames omitted
>>> Caused by: org.apache.flink.util.SerializedThrowable: Cannot allocate memory
>>>     at java.io.FileOutputStream.writeBytes(Native Method)
>>>     at java.io.FileOutputStream.write(FileOutputStream.java:326)
>>>     at 
>>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>>>     at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
>>>     at java.io.FilterOutputStream.write(FilterOutputStream.java:77)
>>>     at java.io.FilterOutputStream.write(FilterOutputStream.java:125)
>>>     at 
>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
>>>     at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>     at 
>>> org.apache.flink.fs.s3presto.common.HadoopDataOutputStream.write(HadoopDataOutputStream.java:47)
>>>     at 
>>> org.apache.flink.core.fs.FSDataOutputStreamWrapper.write(FSDataOutputStreamWrapper.java:66)
>>>     at 
>>> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.write(FsCheckpointStreamFactory.java:220)
>>>     at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>     at 
>>> org.apache.flink.formats.avro.utils.DataOutputEncoder.writeBytes(DataOutputEncoder.java:92)
>>>     at 
>>> org.apache.flink.formats.avro.utils.DataOutputEncoder.writeString(DataOutputEncoder.java:113)
>>>     at org.apache.avro.io.Encoder.writeString(Encoder.java:130)
>>>     at 
>>> org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:323)
>>>     at 
>>> org.apache.avro.generic.GenericDatumWriter.writeMap(GenericDatumWriter.java:281)
>>>     at 
>>> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:139)
>>>     at 
>>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:82)
>>>     at 
>>> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:144)
>>>     at 
>>> org.apache.avro.specific.SpecificDatumWriter.writeField(SpecificDatumWriter.java:98)
>>>     at 
>>> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:195)
>>>     at 
>>> org.apache.avro.specific.SpecificDatumWriter.writeRecord(SpecificDatumWriter.java:83)
>>>     at 
>>> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:130)
>>>     at 
>>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:82)
>>>     at 
>>> org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:234)
>>>     at 
>>> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:136)
>>>     at 
>>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:82)
>>>     at 
>>> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:144)
>>>     at 
>>> org.apache.avro.specific.SpecificDatumWriter.writeField(SpecificDatumWriter.java:98)
>>>     at 
>>> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:195)
>>>     at 
>>> org.apache.avro.specific.SpecificDatumWriter.writeRecord(SpecificDatumWriter.java:83)
>>>     at 
>>> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:130)
>>>     at 
>>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:82)
>>>     at 
>>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:72)
>>>     at 
>>> org.apache.flink.formats.avro.typeutils.AvroSerializer.serialize(AvroSerializer.java:185)
>>>     at 
>>> org.apache.flink.runtime.state.HeapBroadcastState.write(HeapBroadcastState.java:109)
>>>     at 
>>> org.apache.flink.runtime.state.DefaultOperatorStateBackendSnapshotStrategy$1.callInternal(DefaultOperatorStateBackendSnapshotStrategy.java:167)
>>>     at 
>>> org.apache.flink.runtime.state.DefaultOperatorStateBackendSnapshotStrategy$1.callInternal(DefaultOperatorStateBackendSnapshotStrategy.java:108)
>>>     at 
>>> org.apache.flink.runtime.state.AsyncSnapshotCallable.call(AsyncSnapshotCallable.java:75)
>>>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>     at 
>>> org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:458)

Reply via email to