dmenin commented on issue #3975: URL: https://github.com/apache/hudi/issues/3975#issuecomment-971612278
Hi @xushiyan, I'd like to add some more information here. I have hard evidence that the delete time is not necessarily correlated with the amount of data being deleted. see the red line on the first graph? Thats the time it took just to delete the data; The blue line is the time it took to upsert the data. The bars on the graph bellow show amount of data being inserted\deleted. In a normal load (the smallest bars), I delete 500 rows and upsert 300k rows. If you look at the "highest" bar, I forced one larger load where I delete 9k rows and upsert 3.5M rows- see how the red line barely moved - while the blue line did move significantly?  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
