The time to kill a process and free its memory can be critical when the killing was done to prevent memory shortages affecting system responsiveness.
In the case of Android, where processes can be restarted easily, killing a less important background process is preferred to delaying or throttling an interactive foreground process. At the same time unnecessary kills should be avoided as they cause delays when the killed process is needed again. This requires a balanced decision from the system software about how long a kill can be postponed in the hope that memory usage will decrease without such drastic measures. As killing a process and reclaiming its memory is not an instant operation, a margin of free memory has to be maintained to prevent system performance deterioration while memory of the killed process is being reclaimed. The size of this margin depends on the minimum reclaim rate to cover the worst-case scenario and this minimum rate should be deterministic. Note that on asymmetric architectures like ARM big.LITTLE the reclaim rate can vary dramatically depending on which core it’s performed at (see test results). It’s a usual scenario that a non-essential victim process is being restricted to a less performant or throttled CPU for power saving purposes. This makes the worst-case reclaim rate scenario very probable. The cases when victim’s memory reclaim can be delayed further due to process being blocked in an uninterruptible sleep or when it performs a time-consuming operation makes the reclaim time even more unpredictable. Increasing memory reclaim rate and making it more deterministic would allow for a smaller free memory margin and would lead to more opportunities to avoid killing a process. Note that while other strategies like throttling memory allocations are viable and can be employed for other non-essential processes they would affect user experience if applied towards an interactive process. Proposed solution uses existing oom-reaper thread to increase memory reclaim rate of a killed process and to make this rate more deterministic. By no means the proposed solution is considered the best and was chosen because it was simple to implement and allowed for test data collection. The downside of this solution is that it requires additional “expedite” hint for something which has to be fast in all cases. Would be great to find a way that does not require additional hints. Other possible approaches include: - Implementing a dedicated syscall to perform opportunistic reclaim in the context of the process waiting for the victim’s death. A natural boost bonus occurs if the waiting process has high or RT priority and is not limited by cpuset cgroup in its CPU choices. - Implement a mechanism that would perform opportunistic reclaim if it’s possible unconditionally (similar to checks in task_will_free_mem()). - Implement opportunistic reclaim that uses shrinker interface, PSI or other memory pressure indications as a hint to engage. Test details: Tests are performed on a Qualcomm® Snapdragon™ 845 8-core ARM big.LITTLE system with 4 little cores (0.3-1.6GHz) and 4 big cores (0.8-2.5GHz) running Android. Memory reclaim speed was measured using signal/signal_generate, kmem/rss_stat and sched/sched_process_exit traces. Test results: powersave governor, min freq normal kills expedited kills little 856 MB/sec 3236 MB/sec big 5084 MB/sec 6144 MB/sec performance governor, max freq normal kills expedited kills little 5602 MB/sec 8144 MB/sec big 14656 MB/sec 12398 MB/sec schedutil governor (default) normal kills expedited kills little 2386 MB/sec 3908 MB/sec big 7282 MB/sec 6820-16386 MB/sec ================================================================= min reclaim speed: 856 MB/sec 3236 MB/sec The patches are based on 5.1-rc1 Suren Baghdasaryan (2): mm: oom: expose expedite_reclaim to use oom_reaper outside of oom_kill.c signal: extend pidfd_send_signal() to allow expedited process killing include/linux/oom.h | 1 + include/linux/sched/signal.h | 3 ++- include/linux/signal.h | 11 ++++++++++- ipc/mqueue.c | 2 +- kernel/signal.c | 37 ++++++++++++++++++++++++++++-------- kernel/time/itimer.c | 2 +- mm/oom_kill.c | 15 +++++++++++++++ 7 files changed, 59 insertions(+), 12 deletions(-) -- 2.21.0.392.gf8f6787159e-goog