Abhishek Chennaka created KUDU-3592: ---------------------------------------
Summary: Memory spike in tablets with huge number of updates Key: KUDU-3592 URL: https://issues.apache.org/jira/browse/KUDU-3592 Project: Kudu Issue Type: Improvement Reporter: Abhishek Chennaka Attachments: Screen Shot 2024-07-18 at 12.19.25 PM.png, metrics.txt, metrics_144.txt, metrics_147.txt Came across a scenario where a tablet with 868 rows received about 10-20 million upserts (as updates) per minute which caused a strange behavior(with ancient mark set to 15 minutes): 1. Any minor/major compaction on any of those tablet replicas leads to a memory spike in the servers using up all the memory in the server and eventually the process was killed with OOM message by the OS. {code:java} I20240715 12:35:40.534596 851004 maintenance_manager.cc:392] P d247fdcc1e4a45f8a01b8155960280a6: Scheduling MinorDeltaCompactionOp(f99cddc1e2444bacbcdd117b0b377a02): perf score=0.023000{code} As soon as the delta compactions started, we see a spike in memory usage of the tablet server process. The usage went up until the process was killed: {code:java} W20240715 12:41:10.909035 850831 tablet_service.cc:1608] Rejecting Write request: Soft memory limit exceeded (at 310.52% of capacity) [suppressed 9 similar messages] W20240715 12:41:11.816135 850916 raft_consensus.cc:1537] Rejecting consensus request: Soft memory limit exceeded (at 311.39% of capacity) [suppressed 5 similar messages] W20240715 12:41:11.920109 850831 tablet_service.cc:1608] Rejecting Write request: Soft memory limit exceeded (at 311.51% of capacity) [suppressed 17 similar messages] W20240715 12:41:12.945374 850845 tablet_service.cc:1608] Rejecting Write request: Soft memory limit exceeded (at 312.54% of capacity) [suppressed 26 similar messages] W20240715 12:41:12.923309 850906 raft_consensus.cc:1537] Rejecting consensus request: Soft memory limit exceeded (at 312.49% of capacity) [suppressed 3 similar messages]{code} Eventually the process was killed by the OS: {code:java} kernel: Out of memory: Killed process 850414 (kudu-tserver) total-vm:422911368kB, anon-rss:387222264kB, file-rss:0kB, shmem-rss:0kB, UID:39977 pgtables:822744kB oom_score_adj:0{code} 2. Scanning this tablet also caused a memory spike by the tablet server taking upto almost 90% of the memory in the server (memory hard limit in Kudu was set about 30% or less)[attached screenshot of the scans dashboard webpage]. Interestingly the on_disk_size of this tablet was only about 6GB and on_disk_data_size about 37KB[attached metrics related to this tablet in metrics.txt] but the memory consumed was in the order of hundreds of GB. 3. We also noticed an issue with bootstrapping one of such update heavy tablets where the recovery of the WAL dir of the tablet (this is a different tablet from the above)took upto 1.8TB causing the server to crash. The number of WAL segments in the original directory were about 250 but the recovery WAL dir had about ~200k wal segments. We could not collect much information on this as the tablet was deleted to avoid downtime, but if the issue is seen again it would be good to collect the tablet metadata and the examine the WAL segments for the config index values present. [We got the metrics of the tablet attached metrics_144.txt and metrics_147.txt]. While we investigate the root cause of such behavior it might be a good idea to A. Impose some rail guards on the number of deltas/updates that can be accumulated and throttling writes until compaction is done to reduce the number of deltas. B. Have more strict checks on the memory usage during delta compactions and scanning of the tablet. It needs to be noted that the workload of tens of millions of updates was not expected and the changes in the application were reverted which calmed things down. This could be an application error but we should have some rail guards from Kudu to not cause the entire memory to be used up. -- This message was sent by Atlassian Jira (v8.20.10#820010)