Hi Zhe, Per my understanding, the runner in webhdfs goes to NamenodeWebHdfsMethods <https://github.com/apache/hadoop/blob/e9c4616b5e47e9c616799abc532269572ab24e6e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L972>, which eventually calls FSNameSystem#getListing. So it's still throttled on the NN side. Up for discussions for ddos part...
Also, Andrew did some pagination features for webhdfs/httpfs via https://issues.apache.org/jira/browse/HDFS-10784 and https://issues.apache.org/jira/browse/HDFS-10823, to provide better control. Best, -Xiao On Wed, Oct 19, 2016 at 2:08 PM, Zhe Zhang <z...@apache.org> wrote: > Hi, > > The regular HDFS client (DistributedFileSystem) throttles the workload of > listing large directories by dividing the work into batches, something like > below: > {code} > // fetch the first batch of entries in the directory > DirectoryListing thisListing = dfs.listPaths( > src, HdfsFileStatus.EMPTY_NAME); > ...... > if (!thisListing.hasMore()) { // got all entries of the directory > FileStatus[] stats = new FileStatus[partialListing.length]; > {code} > > However, WebHDFS doesn't seem to have this batching logic. > {code} > @Override > public FileStatus[] listStatus(final Path f) throws IOException { > final HttpOpParam.Op op = GetOpParam.Op.LISTSTATUS; > return new FsPathResponseRunner<FileStatus[]>(op, f) { > @Override > FileStatus[] decodeResponse(Map<?,?> json) { > .... > } > }.run(); > } > {code} > > Am I missing anything? So a user can DDoS by {{hadoop fs -ls -R /}} via > WebHDFS? >