You can follow the ticked https://issues.apache.org/jira/browse/FLINK-10243 <https://issues.apache.org/jira/browse/FLINK-10243> as mentioned in that stack overflow question to set this parameter:
“metrics.latency.granularity": https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#metrics-latency-granularity You only have 1.688gb for your TaskManager. I also suggest you to increate the memory configuration otherwise the test may still fail. > On 12 Aug 2022, at 10:52 PM, Yuriy Kutlunin > <yuriy.kutlu...@glowbyteconsulting.com> wrote: > > Hello Yuan, > > I don't override any default settings, docker-compose.yml: > services: > jobmanager: > image: flink:1.15.1-java11 > ports: > - "8081:8081" > command: jobmanager > environment: > - | > FLINK_PROPERTIES= > jobmanager.rpc.address: jobmanager > > taskmanager: > image: flink:1.15.1-java11 > depends_on: > - jobmanager > command: taskmanager > ports: > - "8084:8084" > environment: > - | > FLINK_PROPERTIES= > jobmanager.rpc.address: jobmanager > taskmanager.numberOfTaskSlots: 2 > metrics.reporter.prom.class: > org.apache.flink.metrics.prometheus.PrometheusReporter > env.java.opts: -XX:+HeapDumpOnOutOfMemoryError > From TaskManager log: > INFO [] - Final TaskExecutor Memory configuration: > INFO [] - Total Process Memory: 1.688gb (1811939328 bytes) > INFO [] - Total Flink Memory: 1.250gb (1342177280 bytes) > INFO [] - Total JVM Heap Memory: 512.000mb (536870902 bytes) > INFO [] - Framework: 128.000mb (134217728 bytes) > INFO [] - Task: 384.000mb (402653174 bytes) > INFO [] - Total Off-heap Memory: 768.000mb (805306378 bytes) > INFO [] - Managed: 512.000mb (536870920 bytes) > INFO [] - Total JVM Direct Memory: 256.000mb (268435458 bytes) > INFO [] - Framework: 128.000mb (134217728 bytes) > INFO [] - Task: 0 bytes > INFO [] - Network: 128.000mb (134217730 bytes) > INFO [] - JVM Metaspace: 256.000mb (268435456 bytes) > INFO [] - JVM Overhead: 192.000mb (201326592 bytes) > > I would prefer not to configure memory (at this point), because memory > consumption depends on job structure, so it always can exceed configured > values. > > My next guess is that the problem is not in metrics content, but in their > number, which increases with the number of operators. > So the next question is if there is a way to exclude metric generation on > operator level. > Found same question without correct answer on SOF: > https://stackoverflow.com/questions/54215245/apache-flink-limit-the-amount-of-metrics-exposed > > On Fri, Aug 12, 2022 at 4:05 AM yu'an huang <h.yuan...@gmail.com> wrote: > Hi Yuriy, > > How do you set your TaskMananger Memory? I think 40MB is not significant high > for Flink. And It’s normal to see memory increase if you have more > parallelism or set another metrics on. You can try setting larger moratory > for Flink as explained by following documents. > > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/memory/mem_setup/ > > Best > Yuan > > > >> On 12 Aug 2022, at 12:51 AM, Yuriy Kutlunin >> <yuriy.kutlu...@glowbyteconsulting.com> wrote: >> >> Hi all, >> >> I'm running Flink Cluster in Session Mode via docker-compose as stated in >> docs: >> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#session-cluster-yml >> >> After submitting a test job with many intermediate SQL operations (~500 >> select * from ...) and metrics turned on (JMX or Prometheus) I got OOM: java >> heap space on initialization stage. >> >> Turning metrics off allows the job to get to the Running state. >> Heap consumption also depends on parallelism - same job succeeds when >> submitted with parallelism 1 instead of 2. >> >> There are Task Manager logs for 4 cases: >> JMX parallelism 1 (succeeded) >> JMX parallelism 2 (failed) >> Prometheus parallelism 2 (failed) >> No metrics parallelism 2 (succeeded) >> >> Post OOM heap dump (JMX parallelism 2) shows 2 main consumption points: >> 1. Big value (40MB) for some task configuration >> 2. Many instances (~270k) of some heavy (20KB) value in StreamConfig >> >> Seems like all these heavy values are related to weird task names, which >> includes all the operations: >> Received task Source: source1 -> SourceConversion[2001] -> mapping1 -> >> SourceConversion[2003] -> mapping2 -> SourceConversion[2005] -> ... -> >> mapping500 -> Sink: sink1 (1/1)#0 (1e089cf3b1581ea7c8fb1cd7b159e66b) >> >> Looking for some way to overcome this heap issue. >> >> -- >> Best regards, >> Yuriy Kutlunin >> <many_operators_parallelism_1_with_jmx.txt><many_operators_parallelism_2_with_jmx.txt><many_operators_parallelism_2_no_jmx.txt><many_operators_parallelism_2_with_prom.txt><heap_total.png><heap_task2_conf.png><heap_many_string_instances.png><heap_task1_conf.png> > > > > -- > Best regards, > Yuriy Kutlunin