zhangyue19921010 commented on PR #5416: URL: https://github.com/apache/hudi/pull/5416#issuecomment-1192451053
Hi @vinothchandar and @alexeykudinkin Sorry for the late response. I do several performance test on cluster **Simple schema (several columns) + small amount of data** and **complex schema (dozens of columns) and large amount of data** using bulk_insert. **Simple schema + small amount of data** like the bench marker showed, we may get better performance result from 60% to 2X As for **complex schema (dozens of columns) and large amount of data**: **1000 partitions, 100,000,000 records and dozens of columns**. We can get about 20% performance improved Had to say that, the more data-producing faster than data consuming, the more benefit we can get from this PR. In this scenario, locking is the performance bottleneck. For normal scenes, we can get about 20% performance improvement. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
