Re: [PR] docs: [RFC-98] Design doc of DSv2 read support [hudi]

via GitHub Thu, 19 Mar 2026 16:12:17 -0700


geserdugarov commented on PR #18276:
URL: https://github.com/apache/hudi/pull/18276#issuecomment-4093989774


   @vinothchandar , @yihua ,  first benchmarks for COW table with DSv2 read is 
ready https://github.com/apache/hudi/pull/18351.
   ```text
   Data: 800 parquet files with 30 mln rows, 300 column, 100 GB in total.
   
   ============================================================
   DSv2 vs DSv1 PERFORMANCE COMPARISON
   ============================================================
   
   Full scan (COW)                    : DSv1 avg 273.3s, DSv2 avg 278.0s, 
speedup 0.98x (DSv1 FASTER)
   Projected (COW)                    : DSv1 avg 7.3s, DSv2 avg 5.9s, speedup 
1.24x (DSv2 FASTER)
   Filter (COW)                       : DSv1 avg 7.2s, DSv2 avg 6.0s, speedup 
1.20x (DSv2 FASTER)
   Limit (COW)                        : DSv1 avg 56.6s, DSv2 avg 59.5s, speedup 
0.95x (DSv1 FASTER)
   Aggregate COUNT(*)                 : DSv1 avg 3.6s, DSv2 avg 0.2s, speedup 
18.43x (DSv2 FASTER)
   Aggregate MIN/MAX                  : DSv1 avg 3.8s, DSv2 avg 0.2s, speedup 
20.95x (DSv2 FASTER)
   ```
   
   Implementation of DSv2 read for COW is ready for review 
https://github.com/apache/hudi/pull/18277.
   I will search for reasons of 2% performance drop in full scan, and 5% drop 
for limit queries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] docs: [RFC-98] Design doc of DSv2 read support [hudi]

Reply via email to