Nilotpal Nandi created HDDS-988: ----------------------------------- Summary: containers remain in CLOSING state in one of the datanodes when there datanodes are isolated in docker cluster Key: HDDS-988 URL: https://issues.apache.org/jira/browse/HDDS-988 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode, SCM Reporter: Nilotpal Nandi Attachments: datanode_1, datanode_2, datanode_3, om, scm
steps taken : ------------------- # Created 3 datanodes docker cluster. # wrote some data to create a pipeline. # Then, isolated all datanodes , i.e, datanodes coud not communicate with each other . (datanodes can communicate with scm and om). # Tried to write some data again, write failed as expected. # After waiting for 'ozone.scm.stale.node.interval' and 'ozone.scm.dead.node.interval' , the container replicas are still in CLOSING state. Containers failed to get CLOSED. {noformat} hadoop@8876c7214ee5:~$ cat /data/hdds/hdds/40bb080a-1a9f-42c8-9e20-8257ed567e46/current/containerDir0/*/metadata/*.container !<KeyValueContainerData> checksum: 7ee8f706cf215a5fa4b7e9a195529c15147823ceea302ab4998c7476ee64ebf4 chunksPath: /data/hdds/hdds/40bb080a-1a9f-42c8-9e20-8257ed567e46/current/containerDir0/2/chunks containerDBType: RocksDB containerID: 2 containerType: KeyValueContainer layOutVersion: 1 maxSize: 5368709120 metadata: {} metadataPath: /data/hdds/hdds/40bb080a-1a9f-42c8-9e20-8257ed567e46/current/containerDir0/2/metadata originNodeId: 6e077f73-9fd9-4f4e-930f-578c9857912c originPipelineId: ee5f9e7a-0d63-412a-839a-77af2cf7ca93 state: CLOSING{noformat} Expectation : --------------------- The container should have at least two closed replicas . scm, om datanodes log attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org