dmenin commented on issue #3975: URL: https://github.com/apache/hudi/issues/3975#issuecomment-967301078
Hi xushiyan Thanks for your reply. The number using GLOBAL_SIMPLE where: Around 6 minutes to insert the incoming dataset when I had only one month of data (30 partitions). When I backfilled 2021 (adding more 10 months of data - 300 partitions) the same load job jumped to 26 minutes. Regarding your "replicating the logic" comment, yes you are absolutely right - that is the behaviour I need, but global indices don't perform. If there was an option saying: "use global indices only on partitions A, B and C, that would be perfect. Regarding HBASE indexing, I am aware of that option but for reasons I can't discuss, I can't pursue it. Follow up questions: why do you think this is an index lookup problem? And why does it happen inky on the delete and not on the upset operation? Thanks Diego -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
