Thanks for your reply Robert. Please see attached log from the job manager, the last line is the only thing I see different from a pod that starts up successfully.
On Tue, Nov 3, 2020 at 10:41 AM Robert Metzger <rmetz...@apache.org> wrote: > Hi Claude, > > I agree that you should be able to restart individual pods with a changed > memory configuration. Can you share the full Jobmanager log of the failed > restart attempt? > > I don't think that the log statement you've posted explains a start > failure. > > Regards, > Robert > > On Tue, Nov 3, 2020 at 2:33 AM Claude M <claudemur...@gmail.com> wrote: > >> >> Hello, >> >> I have Flink 1.10.2 installed in a Kubernetes cluster. >> Anytime I make a change to the flink.conf, the Flink jobmanager pod fails >> to restart. >> For example, I modified the following memory setting in the flink.conf: >> jobmanager.memory.flink.size. >> After I deploy the change, the pod fails to restart and the following is >> seen in the log: >> >> WARN >> org.apache.flink.runtime.webmonitor.retriever.impl.RpcGatewayRetriever - >> Error while retrieving the leader gateway. Retrying to connect to >> akka.tcp://flink@flink-jobmanager:50010/user/dispatcher. >> >> The pod can be restored by doing one of the following but these are not >> acceptable solutions: >> >> - Revert the changes made to the flink.conf to the previous settings >> - Remove the Flink Kubernetes deployment before doing a deployment >> - Delete the flink cluster folder in Zookeeper >> >> I don't understand why making any changes in the flink.conf causes this >> problem. >> Any help is appreciated. >> >> >> Thank You >> >
Processing template /mnt/flink-conf/..2020_11_03_17_59_21.864132437/log4j-console.properties.tmpl to file /opt/flink/conf/log4j-console.properties Processing template /mnt/flink-conf/..2020_11_03_17_59_21.864132437/flink-conf.yaml.tmpl to file /opt/flink/conf/flink-conf.yaml Processing template /mnt/flink-conf/log4j-console.properties.tmpl to file /opt/flink/conf/log4j-console.properties Processing template /mnt/flink-conf/flink-conf.yaml.tmpl to file /opt/flink/conf/flink-conf.yaml Starting Job Manager FLINK-11843 zookeeper bug workaround start --- Processing cluster betacluster looking for orphans jobregistry will be listed in r.txt, and jobgraph j.txt FLINK-11843 zookeeper bug workaround end --- config file: blob.server.port: 6124 jobmanager.rpc.address: flink-betacluster-jobmanager jobmanager.rpc.port: 6123 query.server.port: 6125 high-availability: zookeeper high-availability.zookeeper.quorum: zookeeper-0.zk-quorum.default.svc.cluster.local:2181,zookeeper-1.zk-quorum.default.svc.cluster.local:2181,zookeeper-2.zk-quorum.default.svc.cluster.local:2181 high-availability.zookeeper.path.root: /flink high-availability.cluster-id: /betacluster high-availability.jobmanager.port: 50010 high-availability.zookeeper.client.connection-timeout: high-availability.zookeeper.client.session-timeout: akka.ask.timeout: 180s metrics.scope.jm: flink.jobmanager metrics.scope.jm.job: flink.jobmanager.job metrics.scope.tm: flink.taskmanager metrics.scope.tm.job: flink.taskmanager.job metrics.scope.task: flink.task metrics.scope.operator: flink.operator metrics.reporter.dghttp.class: org.apache.flink.metrics.datadog.DatadogHttpReporter metrics.reporter.dghttp.proxyHost: proxy.host metrics.reporter.dghttp.proxyPort: 3128 jobmanager.memory.flink.size: 1024m taskmanager.memory.flink.size: taskmanager.memory.jvm-metaspace.size: 128m cluster.evenly-spread-out-slots: true env.java.opts: -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/flink/log Starting standalonesession as a console application on host flink-betacluster-jobmanager-759cccbdf9-g2mhs. log4j:ERROR Could not find value for key log4j.appender.file log4j:ERROR Could not instantiate appender named "file". 2020-11-03 17:59:23,771 WARN org.apache.flink.configuration.GlobalConfiguration - Error while trying to split key and value in configuration file /opt/flink/conf/flink-conf.yaml:20: "high-availability.zookeeper.client.connection-timeout: " 2020-11-03 17:59:23,772 WARN org.apache.flink.configuration.GlobalConfiguration - Error while trying to split key and value in configuration file /opt/flink/conf/flink-conf.yaml:21: "high-availability.zookeeper.client.session-timeout: " 2020-11-03 17:59:23,773 WARN org.apache.flink.configuration.GlobalConfiguration - Error while trying to split key and value in configuration file /opt/flink/conf/flink-conf.yaml:53: "taskmanager.memory.flink.size: " 2020-11-03 17:59:24,223 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2020-11-03 17:59:24,926 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 2020-11-03 17:59:25,332 WARN org.apache.hadoop.metrics2.impl.MetricsConfig - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties 2020-11-03 17:59:25,398 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl - Scheduled Metric snapshot period at 10 second(s). 2020-11-03 17:59:25,399 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl - s3a-file-system metrics system started 2020-11-03 17:59:25,451 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2020-11-03 17:59:26,168 INFO org.apache.hadoop.conf.Configuration.deprecation - fs.s3a.server-side-encryption-key is deprecated. Instead, use fs.s3a.server-side-encryption.key 2020-11-03 17:59:26,401 WARN org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/tmp/jaas-6274224812074063654.conf'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it. 2020-11-03 17:59:26,408 ERROR org.apache.flink.shaded.curator.org.apache.curator.ConnectionState - Authentication failed 2020-11-03 17:59:26,924 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 2020-11-03 17:59:27,100 WARN org.apache.flink.runtime.webmonitor.WebMonitorUtils - Log file environment variable 'log.file' is not set. 2020-11-03 17:59:27,100 WARN org.apache.flink.runtime.webmonitor.WebMonitorUtils - JobManager log files are unavailable in the web dashboard. Log file location not found in environment variable 'log.file' or configuration key 'Key: 'web.log.path' , default: null (fallback keys: [{key=jobmanager.web.log.path, isDeprecated=true}])'. 2020-11-03 17:59:56,677 WARN org.apache.flink.runtime.webmonitor.retriever.impl.RpcGatewayRetriever - Error while retrieving the leader gateway. Retrying to connect to akka.tcp://flink@flink-betacluster-jobmanager:50010/user/dispatcher.