Matthias Pohl created FLINK-31093: ------------------------------------- Summary: NullpointerException when restoring a FlinkSQL job from a savepoint Key: FLINK-31093 URL: https://issues.apache.org/jira/browse/FLINK-31093 Project: Flink Issue Type: Bug Components: Table SQL / Runtime Affects Versions: 1.17.0 Reporter: Matthias Pohl
I tried to restore a FlinkSQL job from a savepoint and ran into a {{NullPointerException}}: {code} 2023-02-15 16:38:24,835 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Initializing job 'collect' (0263d02536654102f2aa903f843cacd1). 2023-02-15 16:38:24,858 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Job 0263d02536654102f2aa903f843cacd1 reached terminal state FAILED. org.apache.flink.runtime.client.JobInitializationException: Could not start the JobMaster. at org.apache.flink.runtime.jobmaster.DefaultJobMasterServiceProcess.lambda$new$0(DefaultJobMasterServiceProcess.java:97) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1609) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.util.concurrent.CompletionException: java.lang.NullPointerException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606) ... 3 more Caused by: java.lang.NullPointerException at org.apache.flink.api.common.ExecutionConfig.getNumberOfExecutionRetries(ExecutionConfig.java:486) at org.apache.flink.api.common.ExecutionConfig.getRestartStrategy(ExecutionConfig.java:459) at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:99) at org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:119) at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:371) at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:348) at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:123) at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:95) at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ... 3 more {code} The SQL job was submitted through the SQL client: {code} $ -- table created in Flink 1.16.1 $ CREATE TABLE MyTable ( > a bigint, > b int not null, > c varchar, > d timestamp(3) > ) with ('connector' = 'datagen', 'rows-per-second' = '1', 'fields.a.kind' = > 'sequence', 'fields.a.start' = '0', 'fields.a.end' = '1000000'); $ -- SELECT statement ran in Flink 1.16.1 session cluster $ SELECT a FROM MyTable WHERE a = 1 or a = 2 or a IS NOT NULL; {code} The job was stopped with a savepoint from the command line: {code} $ ./bin/flink stop --type native --savepointPath ../1.16.1-savepoint 6029e8e5632a9852c630b1b0e4b62477 {code} A new 1.17-SNAPSHOT (commit: {{21158c06}}) session cluster was started and the following SQL code was executed from within the SQL client: {code} $ SET 'execution.savepoint.path' = '/home/mapohl/research/FLINK-31066/1.16.1-savepoint/savepoint-6029e8-ef1e50f0dd2e'; $ SELECT a FROM MyTable WHERE a = 1 or a = 2 or a IS NOT NULL; [ERROR] Could not execute SQL statement. Reason: java.util.concurrent.CompletionException: java.lang.NullPointerException {code} This caused the {{NullPointerException}} with the aforementioned stacktrace. The error is caused by [ExecutionConfig:486|https://github.com/apache/flink/blob/143464d82814e342aa845f3ac976ae2854fc892f/flink-core/src/main/java/org/apache/flink/api/common/ExecutionConfig.java#L486]. The line can only cause a {{NullPointerException}} if the corresponding configuration is not set. This only happens if the {{ExecutionConfig}} is deserialized but the {{configuration}} field is not deserialized which leaves the field to be {{null}} initialized. This field is not set to {{null}} in any other way. -- This message was sent by Atlassian Jira (v8.20.10#820010)