xushiyan commented on issue #3975: URL: https://github.com/apache/hudi/issues/3975#issuecomment-968090220
@dmenin What i described is worst-case scenario which each delete op route to different files. Deletes on the same file will be consolidated into 1 re-writing. I highlight the worst case to show this can be slow compare to your inserts which have no re-writing at all. So this is expected in COW table. If you have perf concern on this, try convert to MOR where updates/deletes will be appended in log files. And if you configure async compaction to run, then there is no write amplification on ingestion. Also i think you may consider partitioning on immutable fields to avoid records jumping over partitions. Or near-immutable, as occasional partition updates are totally fine to cope with. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
