[GitHub] [hudi] zhangyue19921010 commented on pull request #5416: [HUDI-3963] Use Lock-Free Message Queue Disruptor Improving Hoodie Writing Efficiency

GitBox Fri, 22 Jul 2022 03:57:54 -0700


zhangyue19921010 commented on PR #5416:
URL: https://github.com/apache/hudi/pull/5416#issuecomment-1192451053


   Hi @vinothchandar and @alexeykudinkin Sorry for the late response. I do 
several performance test on cluster
   **Simple schema (several columns) + small amount of data** and **complex 
schema (dozens of columns) and large amount of data** using bulk_insert.
   
   
   **Simple schema + small amount of data** like the bench marker showed, we 
may get better performance result from 60% to 2X
   
   As for **complex schema (dozens of columns) and large amount of data**:
   **1000 partitions, 100,000,000 records and dozens of columns**. We can get 
about 20% performance improved 
   
   Had to say that, the more data-producing faster than data consuming, the 
more benefit we can get from this PR. In this scenario, locking is the 
performance bottleneck.
   
   For normal scenes, we can get about 20% performance improvement.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] zhangyue19921010 commented on pull request #5416: [HUDI-3963] Use Lock-Free Message Queue Disruptor Improving Hoodie Writing Efficiency

Reply via email to