Thanks for your reply Robert.  Please see attached log from the job
manager, the last line is the only thing I see different from a pod that
starts up successfully.

On Tue, Nov 3, 2020 at 10:41 AM Robert Metzger <rmetz...@apache.org> wrote:

> Hi Claude,
>
> I agree that you should be able to restart individual pods with a changed
> memory configuration. Can you share the full Jobmanager log of the failed
> restart attempt?
>
> I don't think that the log statement you've posted explains a start
> failure.
>
> Regards,
> Robert
>
> On Tue, Nov 3, 2020 at 2:33 AM Claude M <claudemur...@gmail.com> wrote:
>
>>
>> Hello,
>>
>> I have Flink 1.10.2 installed in a Kubernetes cluster.
>> Anytime I make a change to the flink.conf, the Flink jobmanager pod fails
>> to restart.
>> For example, I modified the following memory setting in the flink.conf:
>> jobmanager.memory.flink.size.
>> After I deploy the change, the pod fails to restart and the following is
>> seen in the log:
>>
>> WARN
>>  org.apache.flink.runtime.webmonitor.retriever.impl.RpcGatewayRetriever  -
>> Error while retrieving the leader gateway. Retrying to connect to
>> akka.tcp://flink@flink-jobmanager:50010/user/dispatcher.
>>
>> The pod can be restored by doing one of the following but these are not
>> acceptable solutions:
>>
>>    - Revert the changes made to the flink.conf to the previous settings
>>    - Remove the Flink Kubernetes deployment before doing a deployment
>>    - Delete the flink cluster folder in Zookeeper
>>
>> I don't understand why making any changes in the flink.conf causes this
>> problem.
>> Any help is appreciated.
>>
>>
>> Thank You
>>
>
Processing template 
/mnt/flink-conf/..2020_11_03_17_59_21.864132437/log4j-console.properties.tmpl 
to file /opt/flink/conf/log4j-console.properties
Processing template 
/mnt/flink-conf/..2020_11_03_17_59_21.864132437/flink-conf.yaml.tmpl to file 
/opt/flink/conf/flink-conf.yaml
Processing template /mnt/flink-conf/log4j-console.properties.tmpl to file 
/opt/flink/conf/log4j-console.properties
Processing template /mnt/flink-conf/flink-conf.yaml.tmpl to file 
/opt/flink/conf/flink-conf.yaml
Starting Job Manager
FLINK-11843 zookeeper bug workaround start ---
Processing cluster betacluster looking for orphans jobregistry will be listed 
in r.txt, and jobgraph j.txt
FLINK-11843 zookeeper bug workaround end ---
config file: 
blob.server.port: 6124
jobmanager.rpc.address: flink-betacluster-jobmanager
jobmanager.rpc.port: 6123
query.server.port: 6125
high-availability: zookeeper
high-availability.zookeeper.quorum: 
zookeeper-0.zk-quorum.default.svc.cluster.local:2181,zookeeper-1.zk-quorum.default.svc.cluster.local:2181,zookeeper-2.zk-quorum.default.svc.cluster.local:2181
high-availability.zookeeper.path.root: /flink
high-availability.cluster-id: /betacluster
high-availability.jobmanager.port: 50010
high-availability.zookeeper.client.connection-timeout: 
high-availability.zookeeper.client.session-timeout: 
akka.ask.timeout: 180s
metrics.scope.jm: flink.jobmanager
metrics.scope.jm.job: flink.jobmanager.job
metrics.scope.tm: flink.taskmanager
metrics.scope.tm.job: flink.taskmanager.job
metrics.scope.task: flink.task
metrics.scope.operator: flink.operator
metrics.reporter.dghttp.class: 
org.apache.flink.metrics.datadog.DatadogHttpReporter
metrics.reporter.dghttp.proxyHost: proxy.host
metrics.reporter.dghttp.proxyPort: 3128
jobmanager.memory.flink.size: 1024m
taskmanager.memory.flink.size: 
taskmanager.memory.jvm-metaspace.size: 128m
cluster.evenly-spread-out-slots: true
env.java.opts: -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/opt/flink/log
Starting standalonesession as a console application on host 
flink-betacluster-jobmanager-759cccbdf9-g2mhs.
log4j:ERROR Could not find value for key log4j.appender.file
log4j:ERROR Could not instantiate appender named "file".
2020-11-03 17:59:23,771 WARN  
org.apache.flink.configuration.GlobalConfiguration            - Error while 
trying to split key and value in configuration file 
/opt/flink/conf/flink-conf.yaml:20: 
"high-availability.zookeeper.client.connection-timeout: "
2020-11-03 17:59:23,772 WARN  
org.apache.flink.configuration.GlobalConfiguration            - Error while 
trying to split key and value in configuration file 
/opt/flink/conf/flink-conf.yaml:21: 
"high-availability.zookeeper.client.session-timeout: "
2020-11-03 17:59:23,773 WARN  
org.apache.flink.configuration.GlobalConfiguration            - Error while 
trying to split key and value in configuration file 
/opt/flink/conf/flink-conf.yaml:53: "taskmanager.memory.flink.size: "
2020-11-03 17:59:24,223 WARN  org.apache.hadoop.util.NativeCodeLoader           
            - Unable to load native-hadoop library for your platform... using 
builtin-java classes where applicable
2020-11-03 17:59:24,926 INFO  akka.event.slf4j.Slf4jLogger                      
            - Slf4jLogger started
2020-11-03 17:59:25,332 WARN  org.apache.hadoop.metrics2.impl.MetricsConfig     
            - Cannot locate configuration: tried 
hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
2020-11-03 17:59:25,398 INFO  org.apache.hadoop.metrics2.impl.MetricsSystemImpl 
            - Scheduled Metric snapshot period at 10 second(s).
2020-11-03 17:59:25,399 INFO  org.apache.hadoop.metrics2.impl.MetricsSystemImpl 
            - s3a-file-system metrics system started
2020-11-03 17:59:25,451 WARN  org.apache.hadoop.util.NativeCodeLoader           
            - Unable to load native-hadoop library for your platform... using 
builtin-java classes where applicable
2020-11-03 17:59:26,168 INFO  org.apache.hadoop.conf.Configuration.deprecation  
            - fs.s3a.server-side-encryption-key is deprecated. Instead, use 
fs.s3a.server-side-encryption.key
2020-11-03 17:59:26,401 WARN  
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - SASL 
configuration failed: javax.security.auth.login.LoginException: No JAAS 
configuration section named 'Client' was found in specified JAAS configuration 
file: '/tmp/jaas-6274224812074063654.conf'. Will continue connection to 
Zookeeper server without SASL authentication, if Zookeeper server allows it.
2020-11-03 17:59:26,408 ERROR 
org.apache.flink.shaded.curator.org.apache.curator.ConnectionState  - 
Authentication failed
2020-11-03 17:59:26,924 INFO  akka.event.slf4j.Slf4jLogger                      
            - Slf4jLogger started
2020-11-03 17:59:27,100 WARN  
org.apache.flink.runtime.webmonitor.WebMonitorUtils           - Log file 
environment variable 'log.file' is not set.
2020-11-03 17:59:27,100 WARN  
org.apache.flink.runtime.webmonitor.WebMonitorUtils           - JobManager log 
files are unavailable in the web dashboard. Log file location not found in 
environment variable 'log.file' or configuration key 'Key: 'web.log.path' , 
default: null (fallback keys: [{key=jobmanager.web.log.path, 
isDeprecated=true}])'.
2020-11-03 17:59:56,677 WARN  
org.apache.flink.runtime.webmonitor.retriever.impl.RpcGatewayRetriever  - Error 
while retrieving the leader gateway. Retrying to connect to 
akka.tcp://flink@flink-betacluster-jobmanager:50010/user/dispatcher.

Reply via email to