The GC log looks quite normal. Maybe the K8s APIServer is overloaded.

Best,
Yang

houssem <mejrihousse...@gmail.com> 于2021年9月13日周一 下午5:11写道:

> hello,
>
> here's some of full GC log:
>
> OpenJDK 64-Bit Server VM (25.232-b09) for linux-amd64 JRE (1.8.0_232-b09),
> built on Oct 18 2019 15:04:46 by "jenkins" with gcc 4.8.2 20140120 (Red Hat
> 4.8.2-15)
> Memory: 4k page, physical 976560k(946672k free), swap 0k(0k free)
> CommandLine flags: -XX:CompressedClassSpaceSize=260046848
> -XX:InitialHeapSize=1073741824 -XX:MaxHeapSize=1073741824
> -XX:MaxMetaspaceSize=268435456 -XX:+PrintGC -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers
> -XX:+UseCompressedOops
> 2021-09-13T09:28:11.569+0200: 3.516: [Full GC (Metadata GC Threshold)
> 2021-09-13T09:28:11.569+0200: 3.516: [Tenured: 0K->12699K(699072K),
> 0.0986073 secs] 67116K->12699K(1013632K), [Metaspace:
> 20705K->20705K(1067008K)], 0.0987201 secs] [Times: user=0.03 sys=0.02,
> real=0.10 secs]
> 2021-09-13T09:28:15.560+0200: 7.507: [Full GC (Metadata GC Threshold)
> 2021-09-13T09:28:15.560+0200: 7.507: [Tenured: 12699K->24229K(699072K),
> 0.2937536 secs] 105133K->24229K(1013632K), [Metaspace:
> 33805K->33805K(1079296K)], 0.2938554 secs] [Times: user=0.13 sys=0.00,
> real=0.29 secs]
> 2021-09-13T09:28:22.744+0200: 14.691: [Full GC (Metadata GC Threshold)
> 2021-09-13T09:28:22.744+0200: 14.691: [Tenured: 24229K->50182K(699072K),
> 0.2362689 secs] 187184K->50182K(1013632K), [Metaspace:
> 56762K->56762K(1099776K)], 0.2363739 secs] [Times: user=0.11 sys=0.02,
> real=0.24 secs]
> 2021-09-13T09:31:50.257+0200: 222.204: [GC (Allocation Failure)
> 2021-09-13T09:31:50.257+0200: 222.204: [DefNew: 279616K->20089K(314560K),
> 0.1042210 secs] 329798K->70271K(1013632K), 0.1043736 secs] [Times:
> user=0.04 sys=0.03, real=0.10 secs]
> 2021-09-13T09:40:32.456+0200: 744.403: [GC (Allocation Failure)
> 2021-09-13T09:40:32.456+0200: 744.403: [DefNew: 299705K->435K(314560K),
> 0.0255928 secs] 349887K->56275K(1013632K), 0.0257074 secs] [Times:
> user=0.02 sys=0.01, real=0.03 secs]
> 2021-09-13T09:50:41.809+0200: 1353.756: [GC (Allocation Failure)
> 2021-09-13T09:50:41.809+0200: 1353.756: [DefNew: 280051K->551K(314560K),
> 0.0089400 secs] 335891K->56391K(1013632K), 0.0090356 secs] [Times:
> user=0.01 sys=0.00, real=0.01 secs]
> 2021-09-13T10:01:33.109+0200: 2005.056: [GC (Allocation Failure)
> 2021-09-13T10:01:33.109+0200: 2005.056: [DefNew: 280167K->707K(314560K),
> 0.0099544 secs] 336007K->56547K(1013632K), 0.0100724 secs] [Times:
> user=0.00 sys=0.00, real=0.01 secs]
> 2021-09-13T10:11:53.384+0200: 2625.331: [GC (Allocation Failure)
> 2021-09-13T10:11:53.384+0200: 2625.331: [DefNew: 280323K->857K(314560K),
> 0.0095649 secs] 336163K->56697K(1013632K), 0.0096763 secs] [Times:
> user=0.01 sys=0.00, real=0.01 secs]
> 2021-09-13T10:21:31.798+0200: 3203.745: [GC (Allocation Failure)
> 2021-09-13T10:21:31.798+0200: 3203.745: [DefNew: 280473K->945K(314560K),
> 0.0085233 secs] 336313K->56785K(1013632K), 0.0086403 secs] [Times:
> user=0.01 sys=0.00, real=0.01 secs]
> 2021-09-13T10:31:44.561+0200: 3816.508: [GC (Allocation Failure)
> 2021-09-13T10:31:44.561+0200: 3816.508: [DefNew: 280561K->1053K(314560K),
> 0.0103383 secs] 336401K->56893K(1013632K), 0.0104447 secs] [Times:
> user=0.01 sys=0.00, real=0.01 secs]
> 2021-09-13T10:41:51.289+0200: 4423.236: [GC (Allocation Failure)
> 2021-09-13T10:41:51.289+0200: 4423.236: [DefNew: 280669K->1009K(314560K),
> 0.0100803 secs] 336509K->56849K(1013632K), 0.0101961 secs] [Times:
> user=0.01 sys=0.00, real=0.01 secs]
> 2021-09-13T10:52:13.378+0200: 5045.325: [GC (Allocation Failure)
> 2021-09-13T10:52:13.378+0200: 5045.325: [DefNew: 280625K->1266K(314560K),
> 0.0091235 secs] 336465K->57106K(1013632K), 0.0092590 secs] [Times:
> user=0.00 sys=0.01, real=0.01 secs]
> 2021-09-13T11:02:20.253+0200: 5652.200: [GC (Allocation Failure)
> 2021-09-13T11:02:20.253+0200: 5652.200: [DefNew: 280882K->1323K(314560K),
> 0.0097592 secs] 336722K->57163K(1013632K), 0.0098574 secs] [Times:
> user=0.01 sys=0.00, real=0.01 secs]
>
> ************************************************************
>
> and here's my flink-conf.yaml file
> taskmanager.numberOfTaskSlots: 2
> blob.server.port: 6124
> jobmanager.rpc.port: 6123
> taskmanager.rpc.port: 6122
> queryable-state.proxy.ports: 6125
> jobmanager.memory.process.size: 1600m
> taskmanager.memory.process.size: 1728m
> parallelism.default: 2
>
> #HA K8S
> kubernetes.cluster-id: myJob
> high-availability:
> org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
> high-availability.storageDir: s3://flink-data-integ/data/flink-ha/myJob
> kubernetes.namespace: flink-pushavoo-flink-rec
> high-availability.kubernetes.leader-election.lease-duration: 60 s
> high-availability.kubernetes.leader-election.renew-deadline: 60 s
>
> restart-strategy: fixed-delay
> restart-strategy.fixed-delay.attempts: 10
>
> #Checkpoints
> state.backend: filesystem
> state.checkpoints.dir: s3://flink-data/data/checkpoints/myJob
> state.checkpoints.num-retained: 10
>
> #flink-prometheus
> metrics.reporters: prometheus
> metrics.reporter.prometheus.class:
> org.apache.flink.metrics.prometheus.PrometheusReporter
> metrics.reporter.prometheus.port: 9249
>
> #logback
> classloader.parent-first-patterns.additional: net.logstash.logback
>
> #S3
> s3.endpoint: *******
> s3.access-key: ********
> s3.secret-key: ******
> env.java.opts.jobmanager: -verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps -Xloggc:/opt/flink/log/jobmanager-gc.log
>
>
>

Reply via email to