I wanted to know when it is safe to remove a node from a machine from a cluster.
My assumption is that it could be safe to remove a machine if the machine does not have any containers, and it does not store any useful data. By the APIs at https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html, we can do GET http://<rm http address:port>/ws/v1/cluster/nodes to get the information of each node like <node> <rack>/default-rack</rack> <state>RUNNING</state> <id>host1.domain.com:54158</id> <nodeHostName>host1.domain.com</nodeHostName> <nodeHTTPAddress>host1.domain.com:8042</nodeHTTPAddress> <lastHealthUpdate>1476995346399</lastHealthUpdate> <version>3.0.0-SNAPSHOT</version> <healthReport></healthReport> <numContainers>0</numContainers> <usedMemoryMB>0</usedMemoryMB> <availMemoryMB>8192</availMemoryMB> <usedVirtualCores>0</usedVirtualCores> <availableVirtualCores>8</availableVirtualCores> <resourceUtilization> <nodePhysicalMemoryMB>1027</nodePhysicalMemoryMB> <nodeVirtualMemoryMB>1027</nodeVirtualMemoryMB> <nodeCPUUsage>0.006664445623755455</nodeCPUUsage> <aggregatedContainersPhysicalMemoryMB>0</aggregatedContainersPhysicalMemoryMB> <aggregatedContainersVirtualMemoryMB>0</aggregatedContainersVirtualMemoryMB> <containersCPUUsage>0.0</containersCPUUsage> </resourceUtilization> </node> If numContainers is 0, I assume it does not run containers. However can it still store any data on disk that other downstream tasks can read? I did not get if Spark lets us know this. I assume if a machine still stores some data useful for the running job, the machine may maintain a heart beat with Spark Driver or some central controller? Can we check this by scanning tcp or udp connections? Is there any other way to check if a machine in a Spark cluster participates a job? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org