SeongHoon Ku created HDFS-17855:
-----------------------------------

             Summary: ViewFS with linkMergeSlash generates invalid paths during 
listStatus/listLocatedStatus operations, causing InvalidPathException or 
incorrect path resolution
                 Key: HDFS-17855
                 URL: https://issues.apache.org/jira/browse/HDFS-17855
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: viewfs
    Affects Versions: 3.4.1, 2.10.2
         Environment: * Hadoop version: 2.10.2
* Configuration: ViewFS with linkMergeSlash enabled
* Affected applications: JobHistoryServer, Hive, any application using ViewFS 
with linkMergeSlash

            Reporter: SeongHoon Ku


h1. Summary

ViewFS with linkMergeSlash generates invalid paths during 
listStatus/listLocatedStatus operations, causing InvalidPathException or 
incorrect path resolution

----

h1. Issue Type

*Bug*

----

h1. Components

* fs
* viewfs

----

h1. Affects Versions

* 2.10.2 (verified)
* Likely affects 3.x versions as well

----

h1. Environment

* Hadoop version: 2.10.2
* Configuration: ViewFS with linkMergeSlash enabled
* Affected applications: JobHistoryServer, Hive, any application using ViewFS 
with linkMergeSlash

----

h1. Description

When ViewFS is configured with {{linkMergeSlash}}, directory listing operations 
using *RemoteIterator* generate invalid paths, causing {{InvalidPathException}} 
errors in applications using the FileContext API.

* Applications using *FileContext API (ViewFs)* with {{listLocatedStatus()}} or 
{{listStatusIterator()}}
* Examples: JobHistoryServer, Hive/Tez applications
* Specifically fails in {{ViewFs$WrappingRemoteIterator.next()}} method

h2. Configuration Example

{code:xml}
<property>
  <name>fs.defaultFS</name>
  <value>viewfs://hadoop-cluster</value>
</property>
<property>
  <name>fs.viewfs.mounttable.hadoop-cluster.linkMergeSlash</name>
  <value>hdfs://hadoop-cluster</value>
</property>
{code}

h2. Error Stack Trace

*JobHistoryServer:*
{noformat}
org.apache.hadoop.fs.InvalidPathException: Invalid path name relative paths not 
allowed:
hadoop-cluster/user/history/done/2021
    at 
org.apache.hadoop.fs.AbstractFileSystem.checkPath(AbstractFileSystem.java:370)
    at 
org.apache.hadoop.fs.AbstractFileSystem.makeQualified(AbstractFileSystem.java:428)
    at 
org.apache.hadoop.fs.viewfs.ViewFs$WrappingRemoteIterator.next(ViewFs.java:848)
    at 
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:238)
{noformat}

*Hive (Tez):*
{noformat}
org.apache.tez.dag.app.dag.impl.AMUserCodeException: java.io.IOException:
cannot find dir = viewfs://hadoop-cluster/user/hive/hadoop-cluster/tmp/hive/...
{noformat}

*Observed pattern:*
* Invalid path: 
{{viewfs://hadoop-cluster/user/hive/hadoop-cluster/tmp/hive/...}}
* Correct path: {{viewfs://hadoop-cluster/tmp/hive/...}}
* Working directory and cluster name are duplicated in the path

----

h1. Root Cause

h2. Technical Analysis

When {{linkMergeSlash}} is configured, the ViewFS root node is created with its 
path name ({{fullPath}}) incorrectly set to {{mountTableName}} instead of 
{{"/"}}.

*Bug location in {{InodeTree.java}}:*

{code:java}
// Current (buggy) code
if (isMergeSlashConfigured) {
  root = new INodeLink<T>(mountTableName, ugi,  // "hadoop-cluster" - BUG!
      initAndGetTargetFs(), mergeSlashTarget);
  mountPoints.add(new MountPoint<T>("/", (INodeLink<T>) root));
  rootFallbackLink = null;
}
{code}

This causes {{root.fullPath}} to be set to the cluster name (e.g., 
{{"hadoop-cluster"}}) instead of {{"/"}}.

h2. Impact Chain

# During path resolution ({{InodeTree.java}}), {{root.fullPath}} is used as 
{{ResolveResult.resolvedPath}}:
{code:java}
if (root.isLink()) {
  ResolveResult<T> res = new ResolveResult<T>(ResultKind.EXTERNAL_DIR,
      getRootLink().getTargetFileSystem(), root.fullPath, remainingPath);
      //                                   ^^^^^^^^^^^^^ Uses mountTableName!
  return res;
}
{code}

# During path conversion in {{ViewFileSystem.getChrootedPath()}} (line 563):
{code:java}
return this.makeQualified(
    suffix.length() == 0 ? f : new Path(res.resolvedPath, suffix));
// Creates: new Path("hadoop-cluster", "user/history/done")
// Result: "hadoop-cluster/user/history/done" (RELATIVE PATH!)
{code}

# {{makeQualified()}} then prepends the working directory to this relative path:
{noformat}
Expected: viewfs://hadoop-cluster/user/history/done
Actual:   viewfs://hadoop-cluster/user/mapred/hadoop-cluster/user/history/done
{noformat}

h2. Why linkMergeSlash Should Use "/"

{{linkMergeSlash}} is designed to merge the entire ViewFS root with a single 
target directory. Therefore:
* ViewFS root ({{/}}) = Target directory specified by linkMergeSlash
* The root node's {{fullPath}} should naturally be {{/}}
* This maintains consistency with the {{MountPoint}} API which already returns 
{{/}}

----

h1. Testing

h2. Test Cases

Added comprehensive test cases in {{TestViewFileSystemLinkMergeSlash.java}}:

# *{{testListStatusReturnsCorrectPaths()}}*
** Verifies {{listStatus()}} returns proper ViewFS paths
** Checks scheme, authority, and path correctness

# *{{testListLocatedStatusReturnsCorrectPaths()}}*
** Verifies {{listLocatedStatus()}} with RemoteIterator
** Ensures lazy evaluation works correctly

# *{{testResolvedPathIsAbsolute()}}*
** Reproduces exact bug scenario (JobHistoryServer use case)
** Validates path resolution for {{/user/history/done/2021}}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to