[ https://issues.apache.org/jira/browse/HDFS-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viraj Jasani resolved HDFS-16918. --------------------------------- Resolution: Won't Fix > Optionally shut down datanode if it does not stay connected to active namenode > ------------------------------------------------------------------------------ > > Key: HDFS-16918 > URL: https://issues.apache.org/jira/browse/HDFS-16918 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: Viraj Jasani > Assignee: Viraj Jasani > Priority: Major > Labels: pull-request-available > > While deploying Hdfs on Envoy proxy setup, depending on the socket timeout > configured at envoy, the network connection issues or packet loss could be > observed. All of envoys basically form a transparent communication mesh in > which each app can send and receive packets to and from localhost and is > unaware of the network topology. > The primary purpose of Envoy is to make the network transparent to > applications, in order to identify network issues reliably. However, > sometimes such proxy based setup could result into socket connection issues > b/ datanode and namenode. > Many deployment frameworks provide auto-start functionality when any of the > hadoop daemons are stopped. If a given datanode does not stay connected to > active namenode in the cluster i.e. does not receive heartbeat response in > time from active namenode (even though active namenode is not terminated), it > would not be much useful. We should be able to provide configurable behavior > such that if a given datanode cannot receive heartbeat response from active > namenode in configurable time duration, it should terminate itself to avoid > impacting the availability SLA. This is specifically helpful when the > underlying deployment or observability framework (e.g. K8S) can start up the > datanode automatically upon it's shutdown (unless it is being restarted as > part of rolling upgrade) and help the newly brought up datanode (in case of > k8s, a new pod with dynamically changing nodes) establish new socket > connection to active and standby namenodes. This should be an opt-in behavior > and not default one. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org