zhangyue19921010 commented on PR #5416:
URL: https://github.com/apache/hudi/pull/5416#issuecomment-1192451053

   Hi @vinothchandar and @alexeykudinkin Sorry for the late response. I do 
several performance test on cluster
   **Simple schema (several columns) + small amount of data** and **complex 
schema (dozens of columns) and large amount of data** using bulk_insert.
   
   
   **Simple schema + small amount of data** like the bench marker showed, we 
may get better performance result from 60% to 2X
   
   As for **complex schema (dozens of columns) and large amount of data**:
   **1000 partitions, 100,000,000 records and dozens of columns**. We can get 
about 20% performance improved 
   
   Had to say that, the more data-producing faster than data consuming, the 
more benefit we can get from this PR. In this scenario, locking is the 
performance bottleneck.
   
   For normal scenes, we can get about 20% performance improvement.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to