On Fri, Jan 28, 2022, 4:33 AM Steven J. West <stevenjonw...@gmail.com> wrote:
> Dear all, > > TL;DR/summary: > > - Tuning vm.watermark_boost_factor to 0 (disable) on Debian > significantly improves performance on memory-intensive tasks that utilise > SWAP space, by stopping preemptive kswapd freeing of memory, and > subsequent page thrashing. > - I suggest that Debian should tune vm-watermark_boost_fact=0 by > default to prevent this problem > > I'm not a Debian maintainer, but this has got to be the best problem report I ever saw :-) But for years I have adopted the philosophy at home which is demanded in every data center I've worked in: If your Linux system is swapping, you have configured it wrong. In the server farms there is no swapping. You make sure you have enough RAM to prevent swapping. EOS. I have recently installed Debian 11 on a HP Z8 G4 Workstation (Z3Z16AV) - > 32GB RAM, installed with ~120GB SWAP on a 2TB solid state drive (specs at > end of this message). > > I have been running some compute-intensive image processing tasks (CPU- > and memory- intensive), which has on occasion had to dip into SWAP space, > depending on image sizes (the processing I am running is image registration > using elastix/transformix). > > I had benchmarked the code on my Ubuntu laptop (similar spec) without any > problems, but when running on Debian, whenever SWAP was needed, the system > processing significantly slowed down/essentially froze. > > After much debugging, I have traced this to the vm.watermark_boost_factor > kernel parameter: > > Comparing the Ubuntu and Debian kernel parameters using sudo sysctl -a > showed two key differences in virtual memory (vm) management parameters. > > - Ubuntu: > - vm.swappiness=60 > - vm.watermark_boost_factor=0 > - Debian: > - vm.swappiness=10 > - vm.watermark_boost_factor=150 > > > I identified what these two parameters control: > > > - vm.swappiness : a parameter used to calculate the swap tendency ( > https://access.redhat.com/solutions/103833) > - vm.watermark_boost_factor : controls the level of reclaim when > memory is being fragmented.. A boost factor of 0 will disable the feature. > ( > > https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/8.4_release_notes/kernel_parameters_changes > ) > > > I changed swappiness and then watermark_boost_factor sequentially, to see > whether tuning these parameters to match my Ubuntu system prevented the > system from freezing under my memory-intensive task. > > > - sudo sysctl vm.swappiness=60 on my Debian system did not prevent the > freezing behaviour. > - sudo sysctl vm.watermark_boost_factor=0 (disabling it) on my Debian > system prevented the freezing behaviour. > > > I then set these permanently by adding the following to /etc/sysctl.conf > > vm.swappiness=60 > vm.watermark_boost_factor=0 > > > Further searching revealed this Ubuntu bug report: > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1861359 > > swap storms kills interactive use > With this key entry: > > Sultan Alsawaf (kerneltoast) wrote on 2020-03-27: #56 > > This problem is caused by an upstream memory management feature called > watermark boosting. Normally, when a memory allocation fails and falls back > to the page allocator, the page allocator will wake up kswapd to free up > pages in order to make the memory allocation succeed. kswapd tries to free > memory until it reaches a minimum amount of memory for each memory zone > called the high watermark. > > What watermark boosting does is try to preemptively fire up kswapd to free > memory when there hasn't been an allocation failure. It does this by > increasing kswapd's high watermark goal and then firing up kswapd. The > reason why this causes freezes is because, with the increased high > watermark goal, kswapd will steal memory from processes that need it in > order to make forward progress. These processes will, in turn, try to > allocate memory again, which will cause kswapd to steal necessary pages > from those processes again, in a positive feedback loop known as page > thrashing. When page thrashing occurs, your system is essentially > livelocked until the necessary forward progress can be made to stop > processes from trying to continuously allocate memory and trigger kswapd to > steal it back. > > This problem already occurs with kswapd *without* watermark boosting, but > it's usually only encountered on machines with a small amount of memory > and/or a slow CPU. Watermark boosting just makes the existing problem worse > enough to notice on higher spec'd machines. > > To fix the issue in this bug, watermark boosting can be disabled with the > following: > # echo 0 > /proc/sys/vm/watermark_boost_factor > > There's really no harm in doing so, because watermark boosting is an > inherently broken feature... > > > So essentially, disabling watermark_boost_factor ensures effective > swapping and reduces page thrashing. > > *I therefore suggest that Debian should > tune vm.watermark_boost_factor=0 by default.* > > Cheers, > > Steve. > > > Below are some more detailed specs of my Debian machine for reference: > > > $ uname -a > Linux panseer 5.10.0-11-amd64 #1 SMP Debian 5.10.92-1 (2022-01-18) x86_64 > GNU/Linux > > > $ lscpu > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > Address sizes: 46 bits physical, 48 bits virtual > CPU(s): 20 > On-line CPU(s) list: 0-19 > Thread(s) per core: 2 > Core(s) per socket: 10 > Socket(s): 1 > NUMA node(s): 1 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 85 > Model name: Intel(R) Xeon(R) Silver 4210R CPU @ > 2.40GHz > Stepping: 7 > CPU MHz: 2511.149 > CPU max MHz: 3200.0000 > CPU min MHz: 1000.0000 > BogoMIPS: 4800.00 > Virtualization: VT-x > L1d cache: 320 KiB > L1i cache: 320 KiB > L2 cache: 10 MiB > L3 cache: 13.8 MiB > NUMA node0 CPU(s): 0-19 > Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled > Vulnerability L1tf: Not affected > Vulnerability Mds: Not affected > Vulnerability Meltdown: Not affected > Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass > disabled via prctl and seccomp > Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and > __user pointer sanitization > Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB > conditional, RSB filling > Vulnerability Srbds: Not affected > Vulnerability Tsx async abort: Mitigation; TSX disabled > Flags: fpu vme de pse tsc msr pae mce cx8 apic > sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm > pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon > pebs bts rep_good nopl xtopology > nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx > est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic mov > be popcnt tsc_deadline_timer aes xsave > avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 > invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tp > r_shadow vnmi flexpriority ept vpid > ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a > avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd a > vx512bw avx512vl xsaveopt xsavec xgetbv1 > xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat > pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_ > vnni md_clear flush_l1d arch_capabilities > > > $ free -h > total used free shared buff/cache > available > Mem: 31Gi 3.6Gi 24Gi 160Mi 3.2Gi > 26Gi > Swap: 119Gi 242Mi 118Gi > > > > > Steven J. West > BSc DPhil FRMS > _________________________________ > International Brain Lab Histology Research Fellow > Sainsbury Wellcome Centre for Neural Circuits and Behaviour > University College London > 25 Howland St, Fitzrovia, London W1T 4JG > +44 (0) 203 108 8197 > steven.w...@internationalbrainlab.org > https://www.internationalbrainlab.com/ > https://www.sainsburywellcome.org/ > > >