Pieter Reuse created HADOOP-12169:
-------------------------------------

             Summary: ListStatus on empty dir in S3A lists itself instead of 
returning an empty list
                 Key: HADOOP-12169
                 URL: https://issues.apache.org/jira/browse/HADOOP-12169
             Project: Hadoop Common
          Issue Type: Bug
          Components: fs/s3
            Reporter: Pieter Reuse
            Assignee: Pieter Reuse


Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour this 
introduces to the S3AFileSystem-class. Calling ListStatus() on an empty bucket 
returns an empty list, while doing the same on an empty directory, returns an 
array of length 1 containing only this directory itself.

The bugfix is quite simple. In the line of code "{code}...if 
(keyPath.equals(f)...{code}" (S3AFileSystem:758), keyPath is qualified wrt. the 
fs and f is not. Therefore, this returns false while it shouldn't. The bugfix 
to make f qualified in this line of code.

More formally: accoring to the formal definition of [The Hadoop FileSystem API 
Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/],
 more specifically FileSystem.listStatus, only child elements of a directory 
should be returned upon a listStatus()-call.

In detail: 
{code}
elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where f(c) 
== True]
{code}
and
{code}
def children(FS, p) = {q for q in paths(FS) where parent(q) == p}
{code}

Which translates to the result of listStatus on an empty directory being an 
empty list. This is the same behaviour as ls has in Unix, which is what someone 
would expect from a FileSystem.

Note: it seemed appropriate to add the test of this patch to the same file as 
the test for HADOOP-11918, but as a result, one of the two will have to be 
rebased wrt. the other before being applied to trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to