Hi Ufuk, hi Stefan,
thanks a lot for your replies. Ufuk, we are using the HDFS state backend. Stefan, I installed 1.1.5 on our machines and built our software with the Flink 1.1.5 dependency, but the problem remains. Below are the logs for savepoint creation [1] and savepoint disposal [2] as well as the logs from the start of the job [3]. There were not many more log lines when I set org.apache.flink.client to DEBUG, so I set the whole package org.apache.flink to DEBUG in the hope of some findings. But I couldn't really find anything suspicious. Again, thanks a lot for your help! Best regards Konstantin [1] 2017-03-28 12:21:32,033 INFO org.apache.flink.client.CliFrontend - -------------------------------------------------------------------------------- 2017-03-28 12:21:32,034 INFO org.apache.flink.client.CliFrontend - Starting Command Line Client (Version: 1.1.3, Rev:a56d810, Date:10.11.2016 @ 13:25:34 CET) 2017-03-28 12:21:32,035 INFO org.apache.flink.client.CliFrontend - Current user: our_user 2017-03-28 12:21:32,035 INFO org.apache.flink.client.CliFrontend - JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.7/24.51-b03 2017-03-28 12:21:32,035 INFO org.apache.flink.client.CliFrontend - Maximum heap size: 1749 MiBytes 2017-03-28 12:21:32,035 INFO org.apache.flink.client.CliFrontend - JAVA_HOME: /usr/java/default 2017-03-28 12:21:32,037 INFO org.apache.flink.client.CliFrontend - Hadoop version: 2.3.0 2017-03-28 12:21:32,038 INFO org.apache.flink.client.CliFrontend - JVM Options: 2017-03-28 12:21:32,038 INFO org.apache.flink.client.CliFrontend - -Dlog.file=/path/to/our/lib/flink-1.1.3/log/flink-our_user-client-ourserver.log 2017-03-28 12:21:32,038 INFO org.apache.flink.client.CliFrontend - -Dlog4j.configuration=file:/path/to/our/lib/flink-1.1.3/conf/log4j-cli.properties 2017-03-28 12:21:32,038 INFO org.apache.flink.client.CliFrontend - -Dlogback.configurationFile=file:/path/to/our/lib/flink-1.1.3/conf/logback.xml 2017-03-28 12:21:32,038 INFO org.apache.flink.client.CliFrontend - Program Arguments: 2017-03-28 12:21:32,038 INFO org.apache.flink.client.CliFrontend - savepoint 2017-03-28 12:21:32,038 INFO org.apache.flink.client.CliFrontend - 7e865198e220bea8a2203ebdb0827b6f 2017-03-28 12:21:32,039 INFO org.apache.flink.client.CliFrontend - -j 2017-03-28 12:21:32,039 INFO org.apache.flink.client.CliFrontend - /path/to/our/lib/our_program/lib/our_program-6.2.6-SNAPSHOT-all.jar 2017-03-28 12:21:32,039 INFO org.apache.flink.client.CliFrontend - Classpath: /path/to/our/lib/flink-1.1.3/lib/flink-dist_2.10-1.1.3.1.jar:/path/to/our/lib/flink-1.1.3/lib/flink-python_2.10-1.1.3.jar:/path/to/our/lib/flink-1.1.3/lib/flink-reporter-1.0.2-20161206.140111-118.jar:/path/to/our/lib/flink-1.1.3/lib/flink-table_2.10-1.1.3.jar:/path/to/our/lib/flink-1.1.3/lib/log4j-1.2.17.jar:/path/to/our/lib/flink-1.1.3/lib/ojdbc6-11.2.0.3.jar:/path/to/our/lib/flink-1.1.3/lib/slf4j-log4j12-1.7.7.jar::/etc/hadoop/conf: 2017-03-28 12:21:32,039 INFO org.apache.flink.client.CliFrontend - -------------------------------------------------------------------------------- 2017-03-28 12:21:32,039 INFO org.apache.flink.client.CliFrontend - Using configuration directory /path/to/our/lib/flink-1.1.3/conf 2017-03-28 12:21:32,039 INFO org.apache.flink.client.CliFrontend - Trying to load configuration file 2017-03-28 12:21:32,050 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: env.java.opts, -Djavax.net.ssl.trustStore=/path/to/our/cacerts -XX:HeapDumpPath=/path/to/our/hadoop/yarn/log -XX:+HeapDumpOnOutOfMemoryError -XX:MaxPermSize=192m 2017-03-28 12:21:32,050 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2017-03-28 12:21:32,050 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-03-28 12:21:32,051 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 256 2017-03-28 12:21:32,051 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 512 2017-03-28 12:21:32,051 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 4 2017-03-28 12:21:32,051 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-03-28 12:21:32,051 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-03-28 12:21:32,051 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-03-28 12:21:32,051 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.submit.enable, false 2017-03-28 12:21:32,052 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.backend, filesystem 2017-03-28 12:21:32,052 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.backend.fs.checkpointdir, hdfs://ourserver:8020/our_user/flink/state 2017-03-28 12:21:32,052 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.network.numberOfBuffers, 4096 2017-03-28 12:21:32,052 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: fs.hdfs.hadoopconf, /etc/hadoop/conf/ 2017-03-28 12:21:32,052 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.mode, zookeeper 2017-03-28 12:21:32,052 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.quorum, ourserver:2181,ourserver2:2181,ourserver3:2181 2017-03-28 12:21:32,053 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.storageDir, hdfs:///our_user/flink/recovery 2017-03-28 12:21:32,053 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.path.root, flink 2017-03-28 12:21:32,053 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.path.namespace, yarn_session 2017-03-28 12:21:32,053 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: reocvery.zookeeper.client.connection-timeout, 30000 2017-03-28 12:21:32,053 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.client.session-timeout, 120000 2017-03-28 12:21:32,053 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.client.retry-wait, 5000 2017-03-28 12:21:32,053 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.client.max-retry-attempts, 5 2017-03-28 12:21:32,054 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: yarn.application-attempts, 10 2017-03-28 12:21:32,054 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: yarn.maximum-failed-containers, 80 2017-03-28 12:21:32,054 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: akka.watch.heartbeat.interval, 50s 2017-03-28 12:21:32,055 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: akka.log.lifecycle.events, true 2017-03-28 12:21:32,055 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: akka.ask.timeout, 20s 2017-03-28 12:21:32,055 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: savepoints.state.backend, filesystem 2017-03-28 12:21:32,055 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: savepoints.state.backend.fs.dir, hdfs:///our_user/flink/savepoints 2017-03-28 12:21:32,281 INFO org.apache.flink.client.CliFrontend - Running 'savepoint' command. 2017-03-28 12:21:32,287 INFO org.apache.flink.client.CliFrontend - Retrieving JobManager. 2017-03-28 12:21:32,288 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Found YARN properties file /tmp/.yarn-properties-our_user 2017-03-28 12:21:32,372 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Using Yarn application id from YARN properties application_1488884688139_2648 2017-03-28 12:21:32,372 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - YARN properties set default parallelism to 12 2017-03-28 12:21:32,372 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Found YARN properties file /tmp/.yarn-properties-our_user 2017-03-28 12:21:32,373 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Using Yarn application id from YARN properties application_1488884688139_2648 2017-03-28 12:21:32,373 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - YARN properties set default parallelism to 12 2017-03-28 12:21:32,440 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: env.java.opts, -Djavax.net.ssl.trustStore=/path/to/our/cacerts -XX:HeapDumpPath=/path/to/our/hadoop/yarn/log -XX:+HeapDumpOnOutOfMemoryError -XX:MaxPermSize=192m 2017-03-28 12:21:32,440 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2017-03-28 12:21:32,440 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-03-28 12:21:32,441 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 256 2017-03-28 12:21:32,441 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 512 2017-03-28 12:21:32,441 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 4 2017-03-28 12:21:32,441 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-03-28 12:21:32,441 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-03-28 12:21:32,441 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-03-28 12:21:32,441 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.submit.enable, false 2017-03-28 12:21:32,441 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.backend, filesystem 2017-03-28 12:21:32,441 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.backend.fs.checkpointdir, hdfs://ourserver:8020/our_user/flink/state 2017-03-28 12:21:32,442 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.network.numberOfBuffers, 4096 2017-03-28 12:21:32,442 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: fs.hdfs.hadoopconf, /etc/hadoop/conf/ 2017-03-28 12:21:32,442 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.mode, zookeeper 2017-03-28 12:21:32,442 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.quorum, ourserver:2181,ourserver:2181,ourserver:2181 2017-03-28 12:21:32,442 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.storageDir, hdfs:///our_user/flink/recovery 2017-03-28 12:21:32,442 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.path.root, flink 2017-03-28 12:21:32,442 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.path.namespace, yarn_session 2017-03-28 12:21:32,442 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: reocvery.zookeeper.client.connection-timeout, 30000 2017-03-28 12:21:32,442 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.client.session-timeout, 120000 2017-03-28 12:21:32,443 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.client.retry-wait, 5000 2017-03-28 12:21:32,443 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.client.max-retry-attempts, 5 2017-03-28 12:21:32,443 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: yarn.application-attempts, 10 2017-03-28 12:21:32,443 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: yarn.maximum-failed-containers, 80 2017-03-28 12:21:32,443 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: akka.watch.heartbeat.interval, 50s 2017-03-28 12:21:32,444 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: akka.log.lifecycle.events, true 2017-03-28 12:21:32,444 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: akka.ask.timeout, 20s 2017-03-28 12:21:32,444 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: savepoints.state.backend, filesystem 2017-03-28 12:21:32,444 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: savepoints.state.backend.fs.dir, hdfs:///our_user/flink/savepoints 2017-03-28 12:21:32,541 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at ourserver/ourserver_ip:8050 2017-03-28 12:21:32,718 INFO org.apache.flink.yarn.YarnClusterDescriptor - Found application JobManager host name 'ourserver' and port '36901' from supplied application id 'application_1488884688139_2648' 2017-03-28 12:21:32,732 INFO org.apache.flink.runtime.util.ZooKeeperUtils - Using 'flink/yarn_session' as zookeeper namespace. 2017-03-28 12:21:32,831 INFO org.apache.flink.shaded.org.apache.curator.framework.imps.CuratorFrameworkImpl - Starting 2017-03-28 12:21:32,832 DEBUG org.apache.flink.shaded.org.apache.curator.CuratorZookeeperClient - Starting 2017-03-28 12:21:32,832 DEBUG org.apache.flink.shaded.org.apache.curator.ConnectionState - Starting 2017-03-28 12:21:32,833 DEBUG org.apache.flink.shaded.org.apache.curator.ConnectionState - reset 2017-03-28 12:21:32,874 INFO org.apache.flink.shaded.org.apache.curator.framework.state.ConnectionStateManager - State change: CONNECTED 2017-03-28 12:21:33,891 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService. 2017-03-28 12:21:33,906 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 2017-03-28 12:21:33,912 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka.tcp://flink@ourserver_ip:36901/user/jobmanager, session ID=a3c337e5-1749-4c42-9949-0203bbae58d5. 2017-03-28 12:21:33,914 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Stopping ZooKeeperLeaderRetrievalService. 2017-03-28 12:21:33,914 DEBUG org.apache.flink.shaded.org.apache.curator.framework.imps.CuratorFrameworkImpl - Closing 2017-03-28 12:21:33,915 DEBUG org.apache.flink.shaded.org.apache.curator.CuratorZookeeperClient - Closing 2017-03-28 12:21:33,915 DEBUG org.apache.flink.shaded.org.apache.curator.ConnectionState - Closing 2017-03-28 12:21:33,920 INFO org.apache.flink.client.CliFrontend - Using address /ourserver_ip:36901 to connect to JobManager. 2017-03-28 12:21:33,926 INFO org.apache.flink.yarn.YarnClusterClient - Starting client actor system. 2017-03-28 12:21:33,928 DEBUG org.apache.flink.runtime.net.ConnectionUtils - Trying to connect to (ourserver/ourserver_ip:36901) from local address ourserver/ourserver_ip with timeout 200 2017-03-28 12:21:33,931 DEBUG org.apache.flink.runtime.net.ConnectionUtils - Using InetAddress.getLocalHost() immediately for the connecting address 2017-03-28 12:21:34,673 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService. 2017-03-28 12:21:34,677 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 2017-03-28 12:21:34,677 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka.tcp://flink@ourserver_ip:36901/user/jobmanager, session ID=a3c337e5-1749-4c42-9949-0203bbae58d5. 2017-03-28 12:21:34,823 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Stopping ZooKeeperLeaderRetrievalService. 2017-03-28 12:21:34,826 INFO org.apache.flink.client.CliFrontend - Triggering savepoint for job 7e865198e220bea8a2203ebdb0827b6f. 2017-03-28 12:21:34,828 INFO org.apache.flink.client.CliFrontend - Waiting for response... 2017-03-28 12:21:34,993 INFO org.apache.flink.client.CliFrontend - Savepoint completed. Path: hdfs:/our_user/flink/savepoints/savepoint-77214a0f9902 2017-03-28 12:21:34,994 INFO org.apache.flink.client.CliFrontend - You can resume your program from this savepoint with the run command. 2017-03-28 12:21:34,994 INFO org.apache.flink.yarn.YarnClusterClient - Shutting down YarnClusterClient from the client shutdown hook 2017-03-28 12:21:34,994 INFO org.apache.flink.yarn.YarnClusterClient - Disconnecting YarnClusterClient from ApplicationMaster [2] 2017-03-28 12:19:58,063 INFO org.apache.flink.client.CliFrontend - -------------------------------------------------------------------------------- 2017-03-28 12:19:58,064 INFO org.apache.flink.client.CliFrontend - Starting Command Line Client (Version: 1.1.3, Rev:a56d810, Date:10.11.2016 @ 13:25:34 CET) 2017-03-28 12:19:58,064 INFO org.apache.flink.client.CliFrontend - Current user: our_user 2017-03-28 12:19:58,064 INFO org.apache.flink.client.CliFrontend - JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.7/24.51-b03 2017-03-28 12:19:58,065 INFO org.apache.flink.client.CliFrontend - Maximum heap size: 1749 MiBytes 2017-03-28 12:19:58,065 INFO org.apache.flink.client.CliFrontend - JAVA_HOME: /usr/java/default 2017-03-28 12:19:58,067 INFO org.apache.flink.client.CliFrontend - Hadoop version: 2.3.0 2017-03-28 12:19:58,067 INFO org.apache.flink.client.CliFrontend - JVM Options: 2017-03-28 12:19:58,068 INFO org.apache.flink.client.CliFrontend - -Dlog.file=/path/to/our/lib/flink-1.1.3/log/flink-our_user-client-ourserver.log 2017-03-28 12:19:58,068 INFO org.apache.flink.client.CliFrontend - -Dlog4j.configuration=file:/path/to/our/lib/flink-1.1.3/conf/log4j-cli.properties 2017-03-28 12:19:58,068 INFO org.apache.flink.client.CliFrontend - -Dlogback.configurationFile=file:/path/to/our/lib/flink-1.1.3/conf/logback.xml 2017-03-28 12:19:58,068 INFO org.apache.flink.client.CliFrontend - Program Arguments: 2017-03-28 12:19:58,069 INFO org.apache.flink.client.CliFrontend - savepoint 2017-03-28 12:19:58,069 INFO org.apache.flink.client.CliFrontend - -d 2017-03-28 12:19:58,069 INFO org.apache.flink.client.CliFrontend - hdfs:/our_user/flink/savepoints/savepoint-d16441420a87 2017-03-28 12:19:58,069 INFO org.apache.flink.client.CliFrontend - -j 2017-03-28 12:19:58,069 INFO org.apache.flink.client.CliFrontend - /path/to/our/lib/our_program/lib/our_program-6.2.6-SNAPSHOT-all.jar 2017-03-28 12:19:58,069 INFO org.apache.flink.client.CliFrontend - Classpath: /path/to/our/lib/flink-1.1.3/lib/flink-dist_2.10-1.1.3.1.jar:/path/to/our/lib/flink-1.1.3/lib/flink-python_2.10-1.1.3.jar:/path/to/our/lib/flink-1.1.3/lib/flink-reporter-1.0.2-20161206.140111-118.jar:/path/to/our/lib/flink-1.1.3/lib/flink-table_2.10-1.1.3.jar:/path/to/our/lib/flink-1.1.3/lib/log4j-1.2.17.jar:/path/to/our/lib/flink-1.1.3/lib/ojdbc6-11.2.0.3.jar:/path/to/our/lib/flink-1.1.3/lib/slf4j-log4j12-1.7.7.jar::/etc/hadoop/conf: 2017-03-28 12:19:58,070 INFO org.apache.flink.client.CliFrontend - -------------------------------------------------------------------------------- 2017-03-28 12:19:58,070 INFO org.apache.flink.client.CliFrontend - Using configuration directory /path/to/our/lib/flink-1.1.3/conf 2017-03-28 12:19:58,070 INFO org.apache.flink.client.CliFrontend - Trying to load configuration file 2017-03-28 12:19:58,085 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: env.java.opts, -Djavax.net.ssl.trustStore=/path/to/our/cacerts -XX:HeapDumpPath=/path/to/our/hadoop/yarn/log -XX:+HeapDumpOnOutOfMemoryError -XX:MaxPermSize=192m 2017-03-28 12:19:58,085 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2017-03-28 12:19:58,086 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-03-28 12:19:58,086 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 256 2017-03-28 12:19:58,086 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 512 2017-03-28 12:19:58,086 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 4 2017-03-28 12:19:58,086 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-03-28 12:19:58,086 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-03-28 12:19:58,087 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-03-28 12:19:58,087 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.submit.enable, false 2017-03-28 12:19:58,087 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.backend, filesystem 2017-03-28 12:19:58,087 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.backend.fs.checkpointdir, hdfs://ourserver:8020/our_user/flink/state 2017-03-28 12:19:58,087 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.network.numberOfBuffers, 4096 2017-03-28 12:19:58,087 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: fs.hdfs.hadoopconf, /etc/hadoop/conf/ 2017-03-28 12:19:58,088 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.mode, zookeeper 2017-03-28 12:19:58,088 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.quorum, ourserver:2181,ourserver2:2181,ourserver3:2181 2017-03-28 12:19:58,088 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.storageDir, hdfs:///our_user/flink/recovery 2017-03-28 12:19:58,088 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.path.root, flink 2017-03-28 12:19:58,088 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.path.namespace, yarn_session 2017-03-28 12:19:58,088 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: reocvery.zookeeper.client.connection-timeout, 30000 2017-03-28 12:19:58,089 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.client.session-timeout, 120000 2017-03-28 12:19:58,089 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.client.retry-wait, 5000 2017-03-28 12:19:58,089 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.client.max-retry-attempts, 5 2017-03-28 12:19:58,089 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: yarn.application-attempts, 10 2017-03-28 12:19:58,089 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: yarn.maximum-failed-containers, 80 2017-03-28 12:19:58,090 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: akka.watch.heartbeat.interval, 50s 2017-03-28 12:19:58,090 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: akka.log.lifecycle.events, true 2017-03-28 12:19:58,090 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: akka.ask.timeout, 20s 2017-03-28 12:19:58,090 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: savepoints.state.backend, filesystem 2017-03-28 12:19:58,090 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: savepoints.state.backend.fs.dir, hdfs:///our_user/flink/savepoints 2017-03-28 12:19:58,367 INFO org.apache.flink.client.CliFrontend - Running 'savepoint' command. 2017-03-28 12:19:58,372 INFO org.apache.flink.client.CliFrontend - Retrieving JobManager. 2017-03-28 12:19:58,373 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Found YARN properties file /tmp/.yarn-properties-our_user 2017-03-28 12:19:58,484 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Using Yarn application id from YARN properties application_1488884688139_2648 2017-03-28 12:19:58,485 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - YARN properties set default parallelism to 12 2017-03-28 12:19:58,485 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Found YARN properties file /tmp/.yarn-properties-our_user 2017-03-28 12:19:58,485 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Using Yarn application id from YARN properties application_1488884688139_2648 2017-03-28 12:19:58,485 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - YARN properties set default parallelism to 12 2017-03-28 12:19:58,604 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: env.java.opts, -Djavax.net.ssl.trustStore=/path/to/our/cacerts -XX:HeapDumpPath=/path/to/our/hadoop/yarn/log -XX:+HeapDumpOnOutOfMemoryError -XX:MaxPermSize=192m 2017-03-28 12:19:58,604 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2017-03-28 12:19:58,604 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-03-28 12:19:58,604 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 256 2017-03-28 12:19:58,605 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 512 2017-03-28 12:19:58,605 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 4 2017-03-28 12:19:58,605 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-03-28 12:19:58,605 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-03-28 12:19:58,605 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-03-28 12:19:58,605 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.submit.enable, false 2017-03-28 12:19:58,605 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.backend, filesystem 2017-03-28 12:19:58,605 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.backend.fs.checkpointdir, hdfs://ourserver:8020/our_user/flink/state 2017-03-28 12:19:58,605 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.network.numberOfBuffers, 4096 2017-03-28 12:19:58,606 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: fs.hdfs.hadoopconf, /etc/hadoop/conf/ 2017-03-28 12:19:58,606 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.mode, zookeeper 2017-03-28 12:19:58,606 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.quorum, ourserver:2181,ourserver:2181,ourserver:2181 2017-03-28 12:19:58,606 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.storageDir, hdfs:///our_user/flink/recovery 2017-03-28 12:19:58,606 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.path.root, flink 2017-03-28 12:19:58,606 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.path.namespace, yarn_session 2017-03-28 12:19:58,606 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: reocvery.zookeeper.client.connection-timeout, 30000 2017-03-28 12:19:58,606 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.client.session-timeout, 120000 2017-03-28 12:19:58,606 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.client.retry-wait, 5000 2017-03-28 12:19:58,606 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: recovery.zookeeper.client.max-retry-attempts, 5 2017-03-28 12:19:58,607 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: yarn.application-attempts, 10 2017-03-28 12:19:58,607 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: yarn.maximum-failed-containers, 80 2017-03-28 12:19:58,607 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: akka.watch.heartbeat.interval, 50s 2017-03-28 12:19:58,607 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: akka.log.lifecycle.events, true 2017-03-28 12:19:58,607 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: akka.ask.timeout, 20s 2017-03-28 12:19:58,608 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: savepoints.state.backend, filesystem 2017-03-28 12:19:58,608 DEBUG org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: savepoints.state.backend.fs.dir, hdfs:///our_user/flink/savepoints 2017-03-28 12:19:58,685 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at ourserver/ourserver_ip:8050 2017-03-28 12:19:58,969 INFO org.apache.flink.yarn.YarnClusterDescriptor - Found application JobManager host name 'ourserver' and port '36901' from supplied application id 'application_1488884688139_2648' 2017-03-28 12:19:58,989 INFO org.apache.flink.runtime.util.ZooKeeperUtils - Using 'flink/yarn_session' as zookeeper namespace. 2017-03-28 12:19:59,114 INFO org.apache.flink.shaded.org.apache.curator.framework.imps.CuratorFrameworkImpl - Starting 2017-03-28 12:19:59,115 DEBUG org.apache.flink.shaded.org.apache.curator.CuratorZookeeperClient - Starting 2017-03-28 12:19:59,115 DEBUG org.apache.flink.shaded.org.apache.curator.ConnectionState - Starting 2017-03-28 12:19:59,115 DEBUG org.apache.flink.shaded.org.apache.curator.ConnectionState - reset 2017-03-28 12:19:59,172 INFO org.apache.flink.shaded.org.apache.curator.framework.state.ConnectionStateManager - State change: CONNECTED 2017-03-28 12:20:00,212 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService. 2017-03-28 12:20:00,229 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 2017-03-28 12:20:00,235 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka.tcp://flink@ourserver_ip:36901/user/jobmanager, session ID=a3c337e5-1749-4c42-9949-0203bbae58d5. 2017-03-28 12:20:00,238 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Stopping ZooKeeperLeaderRetrievalService. 2017-03-28 12:20:00,238 DEBUG org.apache.flink.shaded.org.apache.curator.framework.imps.CuratorFrameworkImpl - Closing 2017-03-28 12:20:00,238 DEBUG org.apache.flink.shaded.org.apache.curator.CuratorZookeeperClient - Closing 2017-03-28 12:20:00,239 DEBUG org.apache.flink.shaded.org.apache.curator.ConnectionState - Closing 2017-03-28 12:20:00,245 INFO org.apache.flink.client.CliFrontend - Using address /ourserver_ip:36901 to connect to JobManager. 2017-03-28 12:20:00,245 INFO org.apache.flink.runtime.util.ZooKeeperUtils - Using 'flink/yarn_session' as zookeeper namespace. 2017-03-28 12:20:00,252 INFO org.apache.flink.yarn.YarnClusterClient - Starting client actor system. 2017-03-28 12:20:00,254 DEBUG org.apache.flink.runtime.net.ConnectionUtils - Trying to connect to (ourserver/ourserver_ip:36901) from local address ourserver/ourserver_ip with timeout 200 2017-03-28 12:20:00,259 DEBUG org.apache.flink.runtime.net.ConnectionUtils - Using InetAddress.getLocalHost() immediately for the connecting address 2017-03-28 12:20:01,209 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService. 2017-03-28 12:20:01,213 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 2017-03-28 12:20:01,213 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka.tcp://flink@ourserver_ip:36901/user/jobmanager, session ID=a3c337e5-1749-4c42-9949-0203bbae58d5. 2017-03-28 12:20:01,442 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Stopping ZooKeeperLeaderRetrievalService. 2017-03-28 12:20:01,446 INFO org.apache.flink.client.CliFrontend - Disposing savepoint 'hdfs:/our_user/flink/savepoints/savepoint-d16441420a87' with JAR /path/to/our/lib/our_program/lib/our_program-6.2.6-SNAPSHOT-all.jar. 2017-03-28 12:20:01,590 INFO org.apache.flink.client.CliFrontend - Waiting for response... 2017-03-28 12:20:01,636 ERROR org.apache.flink.client.CliFrontend - Error while running the command. java.io.IOException: Failed to dispose savepoint hdfs:/our_user/flink/savepoints/savepoint-d16441420a87. at org.apache.flink.runtime.checkpoint.savepoint.FsSavepointStore.disposeSavepoint(FsSavepointStore.java:163) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:745) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:727) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:727) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.ClassNotFoundException: our.company.eventdata.EventDataRecord at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:65) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1483) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1333) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500) at org.apache.flink.api.java.typeutils.runtime.PojoSerializer.readObject(PojoSerializer.java:131) at sun.reflect.GeneratedMethodAccessor51.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500) at org.apache.flink.api.common.state.StateDescriptor.readObject(StateDescriptor.java:268) at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at java.util.HashMap.readObject(HashMap.java:1184) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:291) at org.apache.flink.util.SerializedValue.deserializeValue(SerializedValue.java:58) at org.apache.flink.runtime.checkpoint.SubtaskState.discard(SubtaskState.java:85) at org.apache.flink.runtime.checkpoint.TaskState.discard(TaskState.java:147) at org.apache.flink.runtime.checkpoint.savepoint.SavepointV0.dispose(SavepointV0.java:66) at org.apache.flink.runtime.checkpoint.savepoint.FsSavepointStore.disposeSavepoint(FsSavepointStore.java:151) ... 12 more 2017-03-28 12:20:01,652 INFO org.apache.flink.yarn.YarnClusterClient - Shutting down YarnClusterClient from the client shutdown hook 2017-03-28 12:20:01,653 INFO org.apache.flink.yarn.YarnClusterClient - Disconnecting YarnClusterClient from ApplicationMaster [3] 2017-03-28 10:43:57,361 INFO org.apache.flink.client.CliFrontend - -------------------------------------------------------------------------------- 2017-03-28 10:43:57,362 INFO org.apache.flink.client.CliFrontend - Starting Command Line Client (Version: 1.1.3, Rev:a56d810, Date:10.11.2016 @ 13:25:34 CET) 2017-03-28 10:43:57,362 INFO org.apache.flink.client.CliFrontend - Current user: our_user 2017-03-28 10:43:57,362 INFO org.apache.flink.client.CliFrontend - JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.7/24.51-b03 2017-03-28 10:43:57,362 INFO org.apache.flink.client.CliFrontend - Maximum heap size: 1749 MiBytes 2017-03-28 10:43:57,363 INFO org.apache.flink.client.CliFrontend - JAVA_HOME: /usr/java/default 2017-03-28 10:43:57,365 INFO org.apache.flink.client.CliFrontend - Hadoop version: 2.3.0 2017-03-28 10:43:57,365 INFO org.apache.flink.client.CliFrontend - JVM Options: 2017-03-28 10:43:57,365 INFO org.apache.flink.client.CliFrontend - -Dlog.file=/path/to/our/lib/flink-1.1.3/log/flink-our_user-client-ourserver.log 2017-03-28 10:43:57,365 INFO org.apache.flink.client.CliFrontend - -Dlog4j.configuration=file:/path/to/our/lib/flink-1.1.3/conf/log4j-cli.properties 2017-03-28 10:43:57,365 INFO org.apache.flink.client.CliFrontend - -Dlogback.configurationFile=file:/path/to/our/lib/flink-1.1.3/conf/logback.xml 2017-03-28 10:43:57,365 INFO org.apache.flink.client.CliFrontend - Program Arguments: 2017-03-28 10:43:57,366 INFO org.apache.flink.client.CliFrontend - run 2017-03-28 10:43:57,366 INFO org.apache.flink.client.CliFrontend - -p 2017-03-28 10:43:57,366 INFO org.apache.flink.client.CliFrontend - 5 2017-03-28 10:43:57,366 INFO org.apache.flink.client.CliFrontend - -c 2017-03-28 10:43:57,366 INFO org.apache.flink.client.CliFrontend - our.company.package.OurProgramClass 2017-03-28 10:43:57,366 INFO org.apache.flink.client.CliFrontend - /path/to/our/lib/our_program/lib/our_program.jar 2017-03-28 10:43:57,366 INFO org.apache.flink.client.CliFrontend - /path/to/our/lib/our_program/conf/our_program.properties 2017-03-28 10:43:57,367 INFO org.apache.flink.client.CliFrontend - Classpath: /path/to/our/lib/flink-1.1.3/lib/flink-dist_2.10-1.1.3.1.jar:/path/to/our/lib/flink-1.1.3/lib/flink-python_2.10-1.1.3.jar:/path/to/our/lib/flink-1.1.3/lib/flink-reporter-1.0.2-20161206.140111-118.jar:/path/to/our/lib/flink-1.1.3/lib/flink-table_2.10-1.1.3.jar:/path/to/our/lib/flink-1.1.3/lib/log4j-1.2.17.jar:/path/to/our/lib/flink-1.1.3/lib/ojdbc6-11.2.0.3.jar:/path/to/our/lib/flink-1.1.3/lib/slf4j-log4j12-1.7.7.jar::/etc/hadoop/conf: 2017-03-28 10:43:57,367 INFO org.apache.flink.client.CliFrontend - -------------------------------------------------------------------------------- 2017-03-28 10:43:57,367 INFO org.apache.flink.client.CliFrontend - Using configuration directory /path/to/our/lib/flink-1.1.3/conf 2017-03-28 10:43:57,367 INFO org.apache.flink.client.CliFrontend - Trying to load configuration file 2017-03-28 10:43:57,664 INFO org.apache.flink.client.CliFrontend - Running 'run' command. 2017-03-28 10:43:57,671 INFO org.apache.flink.client.CliFrontend - Building program from JAR file 2017-03-28 10:43:57,827 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Found YARN properties file /tmp/.yarn-properties-our_user 2017-03-28 10:43:57,921 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Using Yarn application id from YARN properties application_1488884688139_2648 2017-03-28 10:43:57,921 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - YARN properties set default parallelism to 12 2017-03-28 10:43:57,921 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Found YARN properties file /tmp/.yarn-properties-our_user 2017-03-28 10:43:57,922 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Using Yarn application id from YARN properties application_1488884688139_2648 2017-03-28 10:43:57,922 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - YARN properties set default parallelism to 12 2017-03-28 10:43:58,046 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at ourserver/ourserver_ip:8050 2017-03-28 10:43:58,237 INFO org.apache.flink.yarn.YarnClusterDescriptor - Found application JobManager host name 'ourserver' and port '36901' from supplied application id 'application_1488884688139_2648' 2017-03-28 10:43:58,246 INFO org.apache.flink.client.CliFrontend - Cluster configuration: Yarn cluster with application id application_1488884688139_2648 2017-03-28 10:43:59,439 INFO org.apache.flink.client.CliFrontend - Using address ourserver_ip:36901 to connect to JobManager. 2017-03-28 10:43:59,439 INFO org.apache.flink.client.CliFrontend - JobManager web interface address http://ourserver:8088/proxy/application_1488884688139_2648/ 2017-03-28 10:43:59,439 DEBUG org.apache.flink.client.CliFrontend - Client slots is set to -1 2017-03-28 10:43:59,440 DEBUG org.apache.flink.client.CliFrontend - Savepoint path is set to null 2017-03-28 10:43:59,440 DEBUG org.apache.flink.client.CliFrontend - User parallelism is set to 5 2017-03-28 10:43:59,440 INFO org.apache.flink.client.CliFrontend - Starting execution of program 2017-03-28 10:43:59,440 INFO org.apache.flink.yarn.YarnClusterClient - Starting program in interactive mode 2017-03-28 10:44:00,593 WARN org.apache.hadoop.hdfs.BlockReaderLocal - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 2017-03-28 10:44:01,672 INFO org.apache.flink.yarn.YarnClusterClient - Waiting until all TaskManagers have connected 2017-03-28 10:44:02,702 INFO org.apache.flink.yarn.YarnClusterClient - Starting client actor system. 2017-03-28 10:44:03,717 INFO org.apache.flink.yarn.YarnClusterClient - TaskManager status (3/1) 2017-03-28 10:44:03,720 INFO org.apache.flink.yarn.YarnClusterClient - All TaskManagers are connected 2017-03-28 10:44:04,736 INFO org.apache.flink.yarn.YarnClusterClient - Submitting job with JobID: d33a2835c9c25881a0765c250bbceb7e. Waiting for job completion. Connected to JobManager at Actor[akka.tcp://flink@ourserver_ip:36901/user/jobmanager#-429328340] 03/28/2017 10:44:06 Job execution switched to status RUNNING. On 27.03.2017 15:24, Ufuk Celebi wrote: > What kind of state backend where you using for the checkpoints? > > If there is a bug that prevents us from deleting the savepoint files > automatically, we can do a manual workaround and delete the > checkpoints files manually. With Flink 1.3 this becomes very straight > forward as savepoint data all go to a self contained directory that > can be deleted manually. > > On Mon, Mar 27, 2017 at 12:46 PM, Stefan Richter > <s.rich...@data-artisans.com> wrote: >> Hi, >> >> could you provide us with the log from the job client, with logging on debug >> level for package org.apache.flink.client? Also, did you check if this >> problem also exists in the latest bugfix release for your version (1.1.5) ? >> >> Best, >> Stefan >> >> >> Am 27.03.2017 um 11:41 schrieb Konstantin Gregor >> <konstantin.gre...@tngtech.com>: >> >> Hey everyone, >> >> we are experiencing an issue in the disposal of savepoints in >> Flink-1.1.3. We have a streaming job that has custom state (user objects >> are part of the state). We create a savepoint: >> >> $ flink savepoint <JOBID> >> [...] >> Savepoint completed. Path: >> hdfs:/bigdata/flink/savepoints/savepoint-20f064fb9f50 >> [...] >> >> Then we want to simply dispose of that savepoint where we also provide >> the jar to the job from which the savepoint was made: >> $ flink savepoint -d >> hdfs:/bigdata/flink/savepoints/savepoint-20f064fb9f50 -j >> /path/to/jar/application.jar >> >> This gives us a ClassNotFoundException of our custom objects [1]. >> >> Adding our jar to the flink/lib directory is not an option for us, >> things will break because of this. >> Does anyone have an idea on how to proceed here? >> >> Thanks and best regards, >> >> Konstantin >> >> [1] >> java.io.IOException: Failed to dispose savepoint >> hdfs:///bigdata/flink/savepoints/savepoint-20f064fb9f50. >> at >> org.apache.flink.runtime.checkpoint.savepoint.FsSavepointStore.disposeSavepoint(FsSavepointStore.java:163) >> at >> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:745) >> at >> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:727) >> at >> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:727) >> at >> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) >> at >> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) >> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) >> at >> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) >> at >> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >> at >> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) >> at >> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) >> at >> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >> at >> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >> Caused by: java.lang.ClassNotFoundException: >> our.company.application.eventdata.EventDataRecord >> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:270) >> at >> org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:65) >> at >> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) >> at >> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) >> at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1483) >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1333) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >> at >> java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500) >> at >> org.apache.flink.api.java.typeutils.runtime.PojoSerializer.readObject(PojoSerializer.java:131) >> at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) >> at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >> at >> java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500) >> at >> org.apache.flink.api.common.state.StateDescriptor.readObject(StateDescriptor.java:268) >> at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) >> at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >> at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >> at java.util.HashMap.readObject(HashMap.java:1184) >> at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) >> at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >> at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706) >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >> at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >> at >> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:291) >> at >> org.apache.flink.util.SerializedValue.deserializeValue(SerializedValue.java:58) >> at >> org.apache.flink.runtime.checkpoint.SubtaskState.discard(SubtaskState.java:85) >> at >> org.apache.flink.runtime.checkpoint.TaskState.discard(TaskState.java:147) >> at >> org.apache.flink.runtime.checkpoint.savepoint.SavepointV0.dispose(SavepointV0.java:66) >> at >> org.apache.flink.runtime.checkpoint.savepoint.FsSavepointStore.disposeSavepoint(FsSavepointStore.java:151) >> >> >> -- >> Konstantin Gregor * konstantin.gre...@tngtech.com >> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring >> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke >> Sitz: Unterföhring * Amtsgericht München * HRB 135082 >> >> -- Konstantin Gregor * konstantin.gre...@tngtech.com TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke Sitz: Unterföhring * Amtsgericht München * HRB 135082