Thanks Xiao! Seems like server-side throttling are still vulnerable to abusing users issuing large listing requests. Once such a request is scheduled, it will keep listing potentially millions of files without having to go through IPC/RPC queue again. It does have to compete for fsn lock though, thanks to this server-side throttling logic.
On Wed, Oct 19, 2016 at 2:33 PM Xiao Chen <x...@cloudera.com> wrote: > Hi Zhe, > > Per my understanding, the runner in webhdfs goes to NamenodeWebHdfsMethods > <https://github.com/apache/hadoop/blob/e9c4616b5e47e9c616799abc532269572ab24e6e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L972>, > which eventually calls FSNameSystem#getListing. So it's still throttled on > the NN side. Up for discussions for ddos part... > > Also, Andrew did some pagination features for webhdfs/httpfs via > https://issues.apache.org/jira/browse/HDFS-10784 and > https://issues.apache.org/jira/browse/HDFS-10823, to provide better > control. > > Best, > > -Xiao > > On Wed, Oct 19, 2016 at 2:08 PM, Zhe Zhang <z...@apache.org> wrote: > > Hi, > > The regular HDFS client (DistributedFileSystem) throttles the workload of > listing large directories by dividing the work into batches, something like > below: > {code} > // fetch the first batch of entries in the directory > DirectoryListing thisListing = dfs.listPaths( > src, HdfsFileStatus.EMPTY_NAME); > ...... > if (!thisListing.hasMore()) { // got all entries of the directory > FileStatus[] stats = new FileStatus[partialListing.length]; > {code} > > However, WebHDFS doesn't seem to have this batching logic. > {code} > @Override > public FileStatus[] listStatus(final Path f) throws IOException { > final HttpOpParam.Op op = GetOpParam.Op.LISTSTATUS; > return new FsPathResponseRunner<FileStatus[]>(op, f) { > @Override > FileStatus[] decodeResponse(Map<?,?> json) { > .... > } > }.run(); > } > {code} > > Am I missing anything? So a user can DDoS by {{hadoop fs -ls -R /}} via > WebHDFS? > > >