[jira] [Created] (HADOOP-17654) abfs incremental listing to support many active listings

Steve Loughran (Jira) Thu, 22 Apr 2021 07:01:03 -0700

Steve Loughran created HADOOP-17654:
---------------------------------------


             Summary: abfs incremental listing to support many active listings
                 Key: HADOOP-17654
                 URL: https://issues.apache.org/jira/browse/HADOOP-17654
             Project: Hadoop Common
          Issue Type: Sub-task
          Components: fs/azure
    Affects Versions: 3.3.1
            Reporter: Steve Loughran


Each incremental iterator submits an async fetcher operation into the JVM's 
common ForkJoin thread pool, which defaults to # of cores -1., unless set iin 
"java.util.concurrent.ForkJoinPool.common.parallelism";

Given the LIST calls are going to be blocking, this may puts a limit on the 
performance of listing if you have many threads executing list requests, e.g 
spark workers.

Reviewing the code, the maximum number of list operations which can collect 
results will be limited to the #of cores -the others are going to block until 
the lists have been processed.

Which may also means: if you have multiple incremental iterators in the same 
thread (e.g. treewalking) there's a risk that you could actually deadlock. 

I'm not convinced this will happen, as once each listing has reached the end of 
its directory or there are 10 pages in the result queue, the submitted 
operation will complete.

But: we need a test for this. Is there any public abfs store with many, many 
objects we could use as a source for listings, similar to the AWS landsat repo 
we (ab)use for such purposes in the s3a ITests?




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (HADOOP-17654) abfs incremental listing to support many active listings

Reply via email to