Re: DistributedFileSystem.listStatus() - Why does it do partial listings then assemble?

Steve Loughran Fri, 03 May 2013 10:14:50 -0700

On 2 May 2013 09:28, Todd Lipcon <t...@cloudera.com> wrote:

> Hi Brad,
>
> The reasoning is that the NameNode locking is somewhat coarse grained. In
> older versions of Hadoop, before it worked this way, we found that listing
> large directories (eg with 100k+ files) could end up holding the namenode's
> lock for a quite long period of time and starve other clients.
>
> Additionally, I believe there is a second API that does the "on-demand"
> fetching of the next set of files from the listing as well, no?
>


HDFS v2; only incompatible change between v1 and v2 FileSystem class.

Chatty over long haul and hangs Amazon S3://  an issue for which there's a
patch to
replicate but not fix the problem
https://issues.apache.org/jira/browse/HADOOP-9410

Good local -but I think it needs test coverage for all the other filesystem
clients that ship w/ Hadoop



FWIW, blobstores do tend to only support paged lists of their blobs, so the
same build-up-as-you-go-along process works there. We should spell out in
the documentation "changes that occur to the filesystem during the
generation of this list MAY not be reflected in the result, and so MAY
result in a partially incomplete or inconsistent view".

-Steve

Re: DistributedFileSystem.listStatus() - Why does it do partial listings then assemble?

Reply via email to