On 12 Aug 2022, at 10:52 PM, Yuriy Kutlunin
<yuriy.kutlu...@glowbyteconsulting.com> wrote:
Hello Yuan,
I don't override any default settings, docker-compose.yml:
services:
jobmanager:
image: flink:1.15.1-java11
ports:
- "8081:8081"
command: jobmanager
environment:
- |
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanager
taskmanager:
image: flink:1.15.1-java11
depends_on:
- jobmanager
command: taskmanager
ports:
- "8084:8084"
environment:
- |
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanager
taskmanager.numberOfTaskSlots: 2
metrics.reporter.prom.class:
org.apache.flink.metrics.prometheus.PrometheusReporter
env.java.opts: -XX:+HeapDumpOnOutOfMemoryError
From TaskManager log:
INFO [] - Final TaskExecutor Memory configuration:
INFO [] - Total Process Memory: 1.688gb (1811939328 bytes)
INFO [] - Total Flink Memory: 1.250gb (1342177280 bytes)
INFO [] - Total JVM Heap Memory: 512.000mb (536870902 bytes)
INFO [] - Framework: 128.000mb (134217728 bytes)
INFO [] - Task: 384.000mb (402653174 bytes)
INFO [] - Total Off-heap Memory: 768.000mb (805306378 bytes)
INFO [] - Managed: 512.000mb (536870920 bytes)
INFO [] - Total JVM Direct Memory: 256.000mb (268435458 bytes)
INFO [] - Framework: 128.000mb (134217728 bytes)
INFO [] - Task: 0 bytes
INFO [] - Network: 128.000mb (134217730 bytes)
INFO [] - JVM Metaspace: 256.000mb (268435456 bytes)
INFO [] - JVM Overhead: 192.000mb (201326592 bytes)
I would prefer not to configure memory (at this point), because
memory consumption depends on job structure, so it always can exceed
configured values.
My next guess is that the problem is not in metrics content, but in
their number, which increases with the number of operators.
So the next question is if there is a way to exclude metric
generation on operator level.
Found same question without correct answer on SOF:
https://stackoverflow.com/questions/54215245/apache-flink-limit-the-amount-of-metrics-exposed
On Fri, Aug 12, 2022 at 4:05 AM yu'an huang <h.yuan...@gmail.com> wrote:
Hi Yuriy,
How do you set your TaskMananger Memory? I think 40MB is not
significant high for Flink. And It’s normal to see memory increase if
you have more parallelism or set another metrics on. You can try
setting larger moratory for Flink as explained by following documents.
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/memory/mem_setup/
Best
Yuan
On 12 Aug 2022, at 12:51 AM, Yuriy Kutlunin
<yuriy.kutlu...@glowbyteconsulting.com> wrote:
Hi all,
I'm running Flink Cluster in Session Mode via docker-compose as
stated in docs:
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#session-cluster-yml
After submitting a test job with many intermediate SQL operations
(~500 select * from ...) and metrics turned on (JMX or Prometheus) I
got OOM: java heap space on initialization stage.
Turning metrics off allows the job to get to the Running state.
Heap consumption also depends on parallelism - same job succeeds
when submitted with parallelism 1 instead of 2.
There are Task Manager logs for 4 cases:
JMX parallelism 1 (succeeded)
JMX parallelism 2 (failed)
Prometheus parallelism 2 (failed)
No metrics parallelism 2 (succeeded)
Post OOM heap dump (JMX parallelism 2) shows 2 main consumption points:
1. Big value (40MB) for some task configuration
2. Many instances (~270k) of some heavy (20KB) value in StreamConfig
Seems like all these heavy values are related to weird task names,
which includes all the operations:
Received task Source: source1 -> SourceConversion[2001] -> mapping1
-> SourceConversion[2003] -> mapping2 -> SourceConversion[2005] ->
... -> mapping500 -> Sink: sink1
(1/1)#0 (1e089cf3b1581ea7c8fb1cd7b159e66b)
Looking for some way to overcome this heap issue.
--
Best regards,
Yuriy Kutlunin
<many_operators_parallelism_1_with_jmx.txt><many_operators_parallelism_2_with_jmx.txt><many_operators_parallelism_2_no_jmx.txt><many_operators_parallelism_2_with_prom.txt><heap_total.png><heap_task2_conf.png><heap_many_string_instances.png><heap_task1_conf.png>
--
Best regards,
Yuriy Kutlunin