Abhishek Chennaka created KUDU-3592:
---------------------------------------

             Summary: Memory spike in tablets with huge number of updates
                 Key: KUDU-3592
                 URL: https://issues.apache.org/jira/browse/KUDU-3592
             Project: Kudu
          Issue Type: Improvement
            Reporter: Abhishek Chennaka
         Attachments: Screen Shot 2024-07-18 at 12.19.25 PM.png, metrics.txt, 
metrics_144.txt, metrics_147.txt

Came across a scenario where a tablet with 868 rows received about 10-20 
million upserts (as updates) per minute which caused a strange behavior(with 
ancient mark set to 15 minutes):

1. Any minor/major compaction on any of those tablet replicas leads to a memory 
spike in the servers using up all the memory in the server and eventually the 
process was killed with OOM message by the OS.
{code:java}
I20240715 12:35:40.534596 851004 maintenance_manager.cc:392] P 
d247fdcc1e4a45f8a01b8155960280a6: Scheduling 
MinorDeltaCompactionOp(f99cddc1e2444bacbcdd117b0b377a02): perf 
score=0.023000{code}
As soon as the delta compactions started, we see a spike in memory usage of the 
tablet server process. The usage went up until the process was killed:
{code:java}
W20240715 12:41:10.909035 850831 tablet_service.cc:1608] Rejecting Write 
request: Soft memory limit exceeded (at 310.52% of capacity) [suppressed 9 
similar messages]
W20240715 12:41:11.816135 850916 raft_consensus.cc:1537] Rejecting consensus 
request: Soft memory limit exceeded (at 311.39% of capacity) [suppressed 5 
similar messages]
W20240715 12:41:11.920109 850831 tablet_service.cc:1608] Rejecting Write 
request: Soft memory limit exceeded (at 311.51% of capacity) [suppressed 17 
similar messages]
W20240715 12:41:12.945374 850845 tablet_service.cc:1608] Rejecting Write 
request: Soft memory limit exceeded (at 312.54% of capacity) [suppressed 26 
similar messages]
W20240715 12:41:12.923309 850906 raft_consensus.cc:1537] Rejecting consensus 
request: Soft memory limit exceeded (at 312.49% of capacity) [suppressed 3 
similar messages]{code}
Eventually the process was killed by the OS:
{code:java}
kernel: Out of memory: Killed process 850414 (kudu-tserver) 
total-vm:422911368kB, anon-rss:387222264kB, file-rss:0kB, shmem-rss:0kB, 
UID:39977 pgtables:822744kB oom_score_adj:0{code}

2. Scanning this tablet also caused a memory spike by the tablet server taking 
upto almost 90% of the memory in the server (memory hard limit in Kudu was set 
about 30% or less)[attached screenshot of the scans dashboard webpage]. 
Interestingly the on_disk_size of this tablet was only about 6GB and 
on_disk_data_size about 37KB[attached metrics related to this tablet in 
metrics.txt] but the memory consumed was in the order of hundreds of GB.

3. We also noticed an issue with bootstrapping one of such update heavy tablets 
where the recovery of the WAL dir of the tablet (this is a different tablet 
from the above)took upto 1.8TB causing the server to crash. The number of WAL 
segments in the original directory were about 250 but the recovery WAL dir had 
about ~200k wal segments. We could not collect much information on this as the 
tablet was deleted to avoid downtime, but if the issue is seen again it would 
be good to collect the tablet metadata and the examine the WAL segments for the 
config index values present. [We got the metrics of the tablet attached 
metrics_144.txt and metrics_147.txt].

While we investigate the root cause of such behavior it might be a good idea to 
A. Impose some rail guards on the number of deltas/updates that can be 
accumulated and throttling writes until compaction is done to reduce the number 
of deltas.
B. Have more strict checks on the memory usage during delta compactions and 
scanning of the tablet.

It needs to be noted that the workload of tens of millions of updates was not 
expected and the changes in the application were reverted which calmed things 
down. This could be an application error but we should have some rail guards 
from Kudu to not cause the entire memory to be used up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to