[ https://issues.apache.org/jira/browse/FLINK-18091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126769#comment-17126769 ]
Congxian Qiu(klion26) edited comment on FLINK-18091 at 6/5/20, 1:40 PM: ------------------------------------------------------------------------ Test on a real cluster, Result excepted. * savepoint relocate can restore successfully * checkpoint relocate will be failed with FileNotFoundException * commit : 8ca67c962a5575aba3a43b8cfd4a79ffc8069bd4 The Long log attached below: {{_username/ip/port and other sensitive information has been masked._}} # For Savepoint {code:java} [~/flink-1.11-SNAPSHOT]$ ./bin/flink savepoint 9bcc2546a841b36a39c46fbe13a2b631 hdfs:///user/xxxxxx/congxianqiu/savepoint -yid application_1591259429117_0007 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/data/work/congxianqiu/flink-1.11-SNAPSHOT/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/data/xxxxxx/hadoop/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] 2020-06-05 20:27:43,039 WARN org.apache.flink.yarn.configuration.YarnLogConfigUtil [] - The configuration directory ('/data/work/congxianqiu/flink-1.11-SNAPSHOT/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file. 2020-06-05 20:27:43,422 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar 2020-06-05 20:27:43,513 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Found Web Interface 10-215-128-84:35572 of application 'application_1591259429117_0007'. Triggering savepoint for job 9bcc2546a841b36a39c46fbe13a2b631. Waiting for response... Savepoint completed. Path: hdfs://ip:port/user/xxxxxx/congxianqiu/savepoint/savepoint-9bcc25-4ed827357f33 You can resume your program from this savepoint with the run command. [~/flink-1.11-SNAPSHOT]$ hadoop fs -ls congxianqiu/savepoint Found 1 items drwxr-xr-x - xxxxxx supergroup 0 2020-06-05 20:27 congxianqiu/savepoint/savepoint-9bcc25-4ed827357f33 [~/congxianqiu/flink-1.11-SNAPSHOT]$ hadoop fs -ls congxianqiu/savepoint/savepoint-9bcc25-4ed827357f33 Found 2 items -rw-r--r-- 3 xxxxxx supergroup 74170 2020-06-05 20:27 congxianqiu/savepoint/savepoint-9bcc25-4ed827357f33/6508ac9e-0d2a-4583-96ad-1d67fb5b1c8a -rw-r--r-- 3 xxxxxx supergroup 1205 2020-06-05 20:27 congxianqiu/savepoint/savepoint-9bcc25-4ed827357f33/_metadata [~/flink-1.11-SNAPSHOT]$ hadoop fs -mv congxianqiu/savepoint/savepoint-9bcc25-4ed827357f33 congxianqiu/savepoint/newsavepointpath [ ~/flink-1.11-SNAPSHOT]$ hadoop fs -ls congxianqiu/savepoint Found 1 items drwxr-xr-x - xxxxxx supergroup 0 2020-06-05 20:27 congxianqiu/savepoint/newsavepointpath [~/flink-1.11-SNAPSHOT]$ hadoop fs -ls congxianqiu/savepoint/newsavepointpath Found 2 items -rw-r--r-- 3 xxxxxx supergroup 74170 2020-06-05 20:27 congxianqiu/savepoint/newsavepointpath/6508ac9e-0d2a-4583-96ad-1d67fb5b1c8a -rw-r--r-- 3 xxxxxx supergroup 1205 2020-06-05 20:27 congxianqiu/savepoint/newsavepointpath/_metadata [~/flink-1.11-SNAPSHOT]$ ./bin/flink run -s hdfs:///user/xxxxxx/congxianqiu/newsavepointpath/_metadata -m yarn-cluster -c com.klion26.data.FlinkDemo /data/work/congxianqiu/flink-1.11-SNAPSHOT/ft_local/Flink-Demo-1.0-SNAPSHOT.jar SLF4J: Class path contains multiple SLF4J bindings. >>>>>> jobmanager.log 2020-06-05 21:11:10,053 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Starting job b2fbfa6527391f035e8eebd791c2f64e from savepoint hdfs:///user/xxxxxx/congxianqiu/savepoint/newsavepointpath/_metadata () 2020-06-05 21:11:10,198 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Reset the checkpoint ID of job b2fbfa6527391f035e8eebd791c2f64e to 3. 2020-06-05 21:11:10,198 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Restoring job b2fbfa6527391f035e8eebd791c2f64e from latest valid checkpoint: Checkpoint 2 @ 0 for b2fbfa6527391f035e8eebd791c2f64e. 2020-06-05 21:11:10,206 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - No master state to restore ....... 2020-06-05 21:11:16,117 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Checkpoint triggering task Source: Custom Source (1/1) of job b2fbfa6527391f035e8eebd791c2f64e is not in state RUNNING but SCHEDULED instead. Aborting checkpoint. 2020-06-05 21:11:19,456 INFO org.apache.flink.yarn.YarnResourceManager [] - Registering TaskManager with ResourceID container_e18_1591259429117_0019_01_000002 (akka.tcp://flink@10-215-128-83:56603/user/rpc/taskmanager_0) at ResourceManager 2020-06-05 21:11:19,566 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Custom Source (1/1) (bc206dbf0e964487e5ba8c4355cb691e) switched from SCHEDULED to DEPLOYING. 2020-06-05 21:11:19,566 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying Source: Custom Source (1/1) (attempt #0) to container_e18_1591259429117_0019_01_000002 @ 10-215-128-83 (dataPort=45167) 2020-06-05 21:11:19,572 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Map -> Sink: Unnamed (1/1) (7ae861cf453455d722d5d4ece0c10d1a) switched from SCHEDULED to DEPLOYING. 2020-06-05 21:11:19,573 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying Map -> Sink: Unnamed (1/1) (attempt #0) to container_e18_1591259429117_0019_01_000002 @ 10-215-128-83 (dataPort=45167) 2020-06-05 21:11:20,467 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Map -> Sink: Unnamed (1/1) (7ae861cf453455d722d5d4ece0c10d1a) switched from DEPLOYING to RUNNING. 2020-06-05 21:11:20,468 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Custom Source (1/1) (bc206dbf0e964487e5ba8c4355cb691e) switched from DEPLOYING to RUNNING. 2020-06-05 21:12:16,199 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering checkpoint 3 @ 1591362736116 for job b2fbfa6527391f035e8eebd791c2f64e. 2020-06-05 21:12:16,854 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed checkpoint 3 for job b2fbfa6527391f035e8eebd791c2f64e (106237 bytes in 736 ms). 2020-06-05 21:13:16,172 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering checkpoint 4 @ 1591362796116 for job b2fbfa6527391f035e8eebd791c2f64e. 2020-06-05 21:13:16,680 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed checkpoint 4 for job b2fbfa6527391f035e8eebd791c2f64e (32823 bytes in 542 ms). {code} 2. log for checkpoint {code:java} [~/flink-1.11-SNAPSHOT]$ hadoop fs -ls congxianqiu/checkpoint/b2fbfa6527391f035e8eebd791c2f64e/ Found 3 items drwxr-xr-x - xxxxxx supergroup 0 2020-06-05 21:15 congxianqiu/checkpoint/b2fbfa6527391f035e8eebd791c2f64e/chk-6 drwxr-xr-x - xxxxxx supergroup 0 2020-06-05 21:15 congxianqiu/checkpoint/b2fbfa6527391f035e8eebd791c2f64e/shared drwxr-xr-x - xxxxxx supergroup 0 2020-06-05 21:11 congxianqiu/checkpoint/b2fbfa6527391f035e8eebd791c2f64e/taskowned [ ~/flink-1.11-SNAPSHOT]$ hadoop fs -mv congxianqiu/checkpoint/b2fbfa6527391f035e8eebd791c2f64e congxianqiu/checkpoint/movecheckpoint [~/flink-1.11-SNAPSHOT]$ hadoop fs -ls congxianqiu/checkpoint/movecheckpoint Found 3 items drwxr-xr-x - xxxxxx supergroup 0 2020-06-05 21:23 congxianqiu/checkpoint/movecheckpoint/chk-6 drwxr-xr-x - xxxxxx supergroup 0 2020-06-05 21:23 congxianqiu/checkpoint/movecheckpoint/shared drwxr-xr-x - xxxxxx supergroup 0 2020-06-05 21:23 congxianqiu/checkpoint/movecheckpoint/taskowned jobmanager.log Caused by: org.apache.hadoop.ipc.RemoteException: File does not exist: /user/xxxxxx/congxianqiu/checkpoint/b2fbfa6527391f035e8eebd791c2f64e/shared/56704ae2-d 04c-4073-aa6b-843a40e15bbe at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1836) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1808) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1723) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java :366) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2213) at org.apache.hadoop.ipc.Client.call(Client.java:1476) ~[hadoop-common-2.7.4.jar:?] at org.apache.hadoop.ipc.Client.call(Client.java:1413) ~[hadoop-common-2.7.4.jar:?] at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) ~[hadoop-common-2.7.4.jar:?] at com.sun.proxy.$Proxy35.getBlockLocations(Unknown Source) ~[?:?] at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:255) ~[hadoop-hdfs- 2.7.4.jar:?] {code} was (Author: klion26): Test on a real cluster, Result excepted. * savepoint relocate can restore successfully * checkpoint relocate will be failed with FileNotFoundException The Long log attached below: {{_username/ip/port and other sensitive information has been masked._}} # For Savepoint {code:java} [~/flink-1.11-SNAPSHOT]$ ./bin/flink savepoint 9bcc2546a841b36a39c46fbe13a2b631 hdfs:///user/xxxxxx/congxianqiu/savepoint -yid application_1591259429117_0007 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/data/work/congxianqiu/flink-1.11-SNAPSHOT/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/data/xxxxxx/hadoop/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] 2020-06-05 20:27:43,039 WARN org.apache.flink.yarn.configuration.YarnLogConfigUtil [] - The configuration directory ('/data/work/congxianqiu/flink-1.11-SNAPSHOT/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file. 2020-06-05 20:27:43,422 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar 2020-06-05 20:27:43,513 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Found Web Interface 10-215-128-84:35572 of application 'application_1591259429117_0007'. Triggering savepoint for job 9bcc2546a841b36a39c46fbe13a2b631. Waiting for response... Savepoint completed. Path: hdfs://ip:port/user/xxxxxx/congxianqiu/savepoint/savepoint-9bcc25-4ed827357f33 You can resume your program from this savepoint with the run command. [~/flink-1.11-SNAPSHOT]$ hadoop fs -ls congxianqiu/savepoint Found 1 items drwxr-xr-x - xxxxxx supergroup 0 2020-06-05 20:27 congxianqiu/savepoint/savepoint-9bcc25-4ed827357f33 [~/congxianqiu/flink-1.11-SNAPSHOT]$ hadoop fs -ls congxianqiu/savepoint/savepoint-9bcc25-4ed827357f33 Found 2 items -rw-r--r-- 3 xxxxxx supergroup 74170 2020-06-05 20:27 congxianqiu/savepoint/savepoint-9bcc25-4ed827357f33/6508ac9e-0d2a-4583-96ad-1d67fb5b1c8a -rw-r--r-- 3 xxxxxx supergroup 1205 2020-06-05 20:27 congxianqiu/savepoint/savepoint-9bcc25-4ed827357f33/_metadata [~/flink-1.11-SNAPSHOT]$ hadoop fs -mv congxianqiu/savepoint/savepoint-9bcc25-4ed827357f33 congxianqiu/savepoint/newsavepointpath [ ~/flink-1.11-SNAPSHOT]$ hadoop fs -ls congxianqiu/savepoint Found 1 items drwxr-xr-x - xxxxxx supergroup 0 2020-06-05 20:27 congxianqiu/savepoint/newsavepointpath [~/flink-1.11-SNAPSHOT]$ hadoop fs -ls congxianqiu/savepoint/newsavepointpath Found 2 items -rw-r--r-- 3 xxxxxx supergroup 74170 2020-06-05 20:27 congxianqiu/savepoint/newsavepointpath/6508ac9e-0d2a-4583-96ad-1d67fb5b1c8a -rw-r--r-- 3 xxxxxx supergroup 1205 2020-06-05 20:27 congxianqiu/savepoint/newsavepointpath/_metadata [~/flink-1.11-SNAPSHOT]$ ./bin/flink run -s hdfs:///user/xxxxxx/congxianqiu/newsavepointpath/_metadata -m yarn-cluster -c com.klion26.data.FlinkDemo /data/work/congxianqiu/flink-1.11-SNAPSHOT/ft_local/Flink-Demo-1.0-SNAPSHOT.jar SLF4J: Class path contains multiple SLF4J bindings. >>>>>> jobmanager.log 2020-06-05 21:11:10,053 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Starting job b2fbfa6527391f035e8eebd791c2f64e from savepoint hdfs:///user/xxxxxx/congxianqiu/savepoint/newsavepointpath/_metadata () 2020-06-05 21:11:10,198 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Reset the checkpoint ID of job b2fbfa6527391f035e8eebd791c2f64e to 3. 2020-06-05 21:11:10,198 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Restoring job b2fbfa6527391f035e8eebd791c2f64e from latest valid checkpoint: Checkpoint 2 @ 0 for b2fbfa6527391f035e8eebd791c2f64e. 2020-06-05 21:11:10,206 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - No master state to restore ....... 2020-06-05 21:11:16,117 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Checkpoint triggering task Source: Custom Source (1/1) of job b2fbfa6527391f035e8eebd791c2f64e is not in state RUNNING but SCHEDULED instead. Aborting checkpoint. 2020-06-05 21:11:19,456 INFO org.apache.flink.yarn.YarnResourceManager [] - Registering TaskManager with ResourceID container_e18_1591259429117_0019_01_000002 (akka.tcp://flink@10-215-128-83:56603/user/rpc/taskmanager_0) at ResourceManager 2020-06-05 21:11:19,566 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Custom Source (1/1) (bc206dbf0e964487e5ba8c4355cb691e) switched from SCHEDULED to DEPLOYING. 2020-06-05 21:11:19,566 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying Source: Custom Source (1/1) (attempt #0) to container_e18_1591259429117_0019_01_000002 @ 10-215-128-83 (dataPort=45167) 2020-06-05 21:11:19,572 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Map -> Sink: Unnamed (1/1) (7ae861cf453455d722d5d4ece0c10d1a) switched from SCHEDULED to DEPLOYING. 2020-06-05 21:11:19,573 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying Map -> Sink: Unnamed (1/1) (attempt #0) to container_e18_1591259429117_0019_01_000002 @ 10-215-128-83 (dataPort=45167) 2020-06-05 21:11:20,467 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Map -> Sink: Unnamed (1/1) (7ae861cf453455d722d5d4ece0c10d1a) switched from DEPLOYING to RUNNING. 2020-06-05 21:11:20,468 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Custom Source (1/1) (bc206dbf0e964487e5ba8c4355cb691e) switched from DEPLOYING to RUNNING. 2020-06-05 21:12:16,199 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering checkpoint 3 @ 1591362736116 for job b2fbfa6527391f035e8eebd791c2f64e. 2020-06-05 21:12:16,854 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed checkpoint 3 for job b2fbfa6527391f035e8eebd791c2f64e (106237 bytes in 736 ms). 2020-06-05 21:13:16,172 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering checkpoint 4 @ 1591362796116 for job b2fbfa6527391f035e8eebd791c2f64e. 2020-06-05 21:13:16,680 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed checkpoint 4 for job b2fbfa6527391f035e8eebd791c2f64e (32823 bytes in 542 ms). {code} 2. log for checkpoint {code:java} [~/flink-1.11-SNAPSHOT]$ hadoop fs -ls congxianqiu/checkpoint/b2fbfa6527391f035e8eebd791c2f64e/ Found 3 items drwxr-xr-x - xxxxxx supergroup 0 2020-06-05 21:15 congxianqiu/checkpoint/b2fbfa6527391f035e8eebd791c2f64e/chk-6 drwxr-xr-x - xxxxxx supergroup 0 2020-06-05 21:15 congxianqiu/checkpoint/b2fbfa6527391f035e8eebd791c2f64e/shared drwxr-xr-x - xxxxxx supergroup 0 2020-06-05 21:11 congxianqiu/checkpoint/b2fbfa6527391f035e8eebd791c2f64e/taskowned [ ~/flink-1.11-SNAPSHOT]$ hadoop fs -mv congxianqiu/checkpoint/b2fbfa6527391f035e8eebd791c2f64e congxianqiu/checkpoint/movecheckpoint [~/flink-1.11-SNAPSHOT]$ hadoop fs -ls congxianqiu/checkpoint/movecheckpoint Found 3 items drwxr-xr-x - xxxxxx supergroup 0 2020-06-05 21:23 congxianqiu/checkpoint/movecheckpoint/chk-6 drwxr-xr-x - xxxxxx supergroup 0 2020-06-05 21:23 congxianqiu/checkpoint/movecheckpoint/shared drwxr-xr-x - xxxxxx supergroup 0 2020-06-05 21:23 congxianqiu/checkpoint/movecheckpoint/taskowned jobmanager.log Caused by: org.apache.hadoop.ipc.RemoteException: File does not exist: /user/xxxxxx/congxianqiu/checkpoint/b2fbfa6527391f035e8eebd791c2f64e/shared/56704ae2-d 04c-4073-aa6b-843a40e15bbe at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1836) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1808) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1723) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java :366) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2213) at org.apache.hadoop.ipc.Client.call(Client.java:1476) ~[hadoop-common-2.7.4.jar:?] at org.apache.hadoop.ipc.Client.call(Client.java:1413) ~[hadoop-common-2.7.4.jar:?] at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) ~[hadoop-common-2.7.4.jar:?] at com.sun.proxy.$Proxy35.getBlockLocations(Unknown Source) ~[?:?] at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:255) ~[hadoop-hdfs- 2.7.4.jar:?] {code} > Test Relocatable Savepoints > --------------------------- > > Key: FLINK-18091 > URL: https://issues.apache.org/jira/browse/FLINK-18091 > Project: Flink > Issue Type: Sub-task > Components: Tests > Reporter: Stephan Ewen > Assignee: Congxian Qiu(klion26) > Priority: Major > Labels: release-testing > Fix For: 1.11.0 > > > The test should do the following: > * take a savepoint. needs to make sure the job has enough state that there > is more than just the "_metadata" file > * copy it to another directory > * start the job from that savepoint by addressing the metadata file and by > addressing the savepoint directory > We should also test that an incremental checkpoint that gets moved fails with > a reasonable exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)