Benchmarked it against real cloud storage AWS S3 (1000 files - 14.6 Kb
each) :

   - Sync time = 219.694 s
   - Async time = 51.853 s

% Improvement = 76.4%
It can be seen as cloud storage has high IO overheads so, async flow can be
beneficial for small files.

I would really appreciate any feedback on this.

On Wed, Mar 18, 2026 at 12:19 AM Varun Lakhyani <[email protected]>
wrote:

> Hey All,
>
> I previously started a discussion on making Spark readers work in parallel
> (asynchronously), which is beneficial in cases with large numbers of small
> files such as compaction, and I have worked on a POC, high-level design,
> implementation, and benchmarking for various scenarios. I presented my
> approach and benchmarking results in the Iceberg Spark sync; the recording
> may be available in the Iceberg Spark Community Sync Notes [0].
>
> I am planning to submit this work as a GSoC 2026 proposal based on this
> idea and was advised to seek formal community vetting on the dev mailing
> list.
>
> Previous DISCUSS thread:
> https://lists.apache.org/thread/b5jrlyv61lmw867kksw05sot2tro5ybn
>
> Issue:
> https://github.com/apache/iceberg/issues/15287
>
> Prototype implementation:
> https://github.com/apache/iceberg/pull/15341
>
> Design document and benchmarking details:
>
> https://docs.google.com/document/d/17vBz5t-gSDdmB0S40MYRceyvmcBSzw9Gii-FcU97Lds/edit?usp=sharing
>
> Initial benchmarking shows noticeable improvements for workloads involving
> many small files, particularly when IO latency is present (details in the
> design document).
>
> Any feedback (+1 / concerns / suggestions) would be appreciated.
> I am specifically looking for community consensus on whether this is a
> viable direction for Iceberg before formalizing the GSoC proposal. The GSoC
> 2026 proposal deadline is March 31 - early feedback would be especially
> appreciated.
>
> [0] Iceberg Spark Community Sync Notes:
> https://docs.google.com/document/d/19nno1RoPznbbxKOZZddZNHHafa7XULjbN6RPExdr2n4/edit?usp=sharing
> --
> Lakhyani Varun
> Indian Institute of Technology Roorkee
>
>

Reply via email to