Hello,
I am running a job on apache Flink 1.19, on AWS EMR (EC2) cluster as a YARN
application.

I have implemented a generic log-based incremental checkpointing for faster
checkpoint.

It is more described in here:

https://flink.apache.org/2022/05/30/improving-speed-and-stability-of-checkpointing-with-generic-log-based-incremental-checkpoints/

My settings are as follows:
state.backend.type: rocksdb
state.checkpoints.dir:
hdfs://%{hiera('bigtop::hadoop_head_node')}:8020/flink-checkpoints
state.backend.incremental: 'true'
state.backend.local-recovery: 'true'
state.backend.changelog.enabled: 'true'
state.backend.changelog.storage: filesystem
dstl.dfs.base-path:
hdfs://%{hiera('bigtop::hadoop_head_node')}:8020/changelog

I have 2 core nodes which run the HDFS.

The error I get is:

Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/changelog/3206cd7fee5f9897156111c0e8ee6e10/dstl/252f346d-fead-4ab4-af46-f0603fe6dfed
could only be written to 0 of the 1 minReplication nodes. There are 2
datanode(s) running and 2 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager
.chooseTarget4NewBlock(BlockManager.java:2473)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp
.chooseTargetForNewBlock(FSDirWriteFileOp.java:293)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(
FSNamesystem.java:3093)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(
NameNodeRpcServer.java:932)
at org.apache.hadoop.hdfs.protocolPB.
ClientNamenodeProtocolServerSideTranslatorPB.addBlock(
ClientNamenodeProtocolServerSideTranslatorPB.java:605)
at org.apache.hadoop.hdfs.protocol.proto.
ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(
ClientNamenodeProtocolProtos.java)

Can anyone help me to understand why I am getting this error and how I can
fix this?

Also note that when I do:
hdfs dfs -ls /changelog
or
hdfs dfs -ls /flink-checkpoints

I do get directories inside it, so it looks like settings are working.

Thanks
Sachin

Reply via email to