Bence Kosztolnik created YARN-11703:
---------------------------------------
Summary: Validate accessibility of Node Manager working directories
Key: YARN-11703
URL: https://issues.apache.org/jira/browse/YARN-11703
Project: Hadoop YARN
Issue Type: Improvement
Components: yarn
Affects Versions: 3.5.0
Reporter: Bence Kosztolnik
Assignee: Bence Kosztolnik
h3. Problem:
If some subdirectory or file changes permission under
*yarn.nodemanager.local-dirs* or {*}yarn.nodemanager.log-dirs{*}, and won't be
accessible by the node manager, then the node manager will not reach an
unhealthy state, but container runs would fail.
h3. Testing:
- run an example PI job in a cluster
- change the user cache directory of the user to not readable by the node
manager. For example:
{noformat}
chmod 222 ./usercache/{user}
{noformat}
- cluster state will stay healthy
- re-run the PI job
- containers will fail on the affected node, with
{noformat}
... Not able to initialize app-cache directories in any of the configured local
directories for user ...{noformat}
h3. Solution:
Add an extra validation to the DirectoryCollection#testdirs to ensure the
content of the local-dirs and log-dirs are accessible by the node manager, and
turn the node unhealthy if not.
New flag will be introduced to enable this validation:
*yarn.nodemanager.working-dir-content-accessibility-validation.enabled*
(default true)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]