On 2025-01-15 15:56:27, Michael Tremer wrote: [...]
>>> I would be happy to hear if running mailman3 in Gunicorn resolves the >>> problem, but maybe it is just a coincidence that the problem doesn’t appear >>> there? >> >> It could be! If you could show us OOM dmesg logs, they should show which >> process was actually using memory when the OOM happens, this should >> inform next steps pretty well. > > [ 9426.955921] apache2 invoked oom-killer: > gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0 > [ 9426.955960] CPU: 3 PID: 1037 Comm: apache2 Not tainted > 6.1.0-30-cloud-amd64 #1 Debian 6.1.124-1 > [ 9426.955967] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 > [ 9426.955970] Call Trace: > [ 9426.955988] <TASK> > [ 9426.956000] dump_stack_lvl+0x44/0x5c > [ 9426.956042] dump_header+0x4a/0x211 > [ 9426.956053] oom_kill_process.cold+0xb/0x10 > [ 9426.956056] out_of_memory+0x1fd/0x4c0 > [ 9426.956076] __alloc_pages_slowpath.constprop.0+0xc83/0xe40 > [ 9426.956087] __alloc_pages+0x305/0x330 > [ 9426.956089] folio_alloc+0x17/0x50 > [ 9426.956097] __filemap_get_folio+0x155/0x340 > [ 9426.956100] filemap_fault+0x139/0x910 > [ 9426.956104] __do_fault+0x31/0x80 > [ 9426.956122] do_fault+0x1b9/0x410 > [ 9426.956125] __handle_mm_fault+0x660/0xfa0 > [ 9426.956130] handle_mm_fault+0xdb/0x2d0 > [ 9426.956133] do_user_addr_fault+0x191/0x550 > [ 9426.956148] exc_page_fault+0x70/0x170 > [ 9426.956157] asm_exc_page_fault+0x22/0x30 > [ 9426.956179] RIP: 0033:0x7fdf79dcb0be > [ 9426.956196] Code: Unable to access opcode bytes at 0x7fdf79dcb094. > [ 9426.956198] RSP: 002b:00007fdf63ffe140 EFLAGS: 00010207 > [ 9426.956203] RAX: 00007fded3700000 RBX: 0000000000000006 RCX: > 00007fdf7afe1923 > [ 9426.956208] RDX: 0000000000000003 RSI: 0000000000100000 RDI: > 0000000000000000 > [ 9426.956210] RBP: 0000000000000000 R08: 00000000ffffffff R09: > 0000000000000000 > [ 9426.956211] R10: 0000000000000022 R11: 0000000000000246 R12: > 000055aad8cd6670 > [ 9426.956212] R13: 00007fded38fb190 R14: 0000000002e4693b R15: > 00007fdd58083010 > [ 9426.956221] </TASK> > [ 9426.956222] Mem-Info: > [ 9426.956227] active_anon:416695 inactive_anon:1553950 isolated_anon:512 > active_file:2 inactive_file:12 isolated_file:0 > unevictable:0 dirty:0 writeback:37 > slab_reclaimable:8546 slab_unreclaimable:9125 > mapped:106341 shmem:144817 pagetables:7111 > sec_pagetables:0 bounce:0 > kernel_misc_reclaimable:0 > free:28903 free_pcp:0 free_cma:0 > [ 9426.956233] Node 0 active_anon:1427824kB inactive_anon:2490352kB > active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):2048kB > isolated(file):0kB mapped:82328kB dirty:0kB writeback:148kB shmem:115724kB > shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 960512kB writeback_tmp:0kB > kernel_stack:3320kB pagetables:16120kB sec_pagetables:0kB all_unreclaimable? > no > [ 9426.956238] Node 1 active_anon:238956kB inactive_anon:3725448kB > active_file:16kB inactive_file:108kB unevictable:0kB isolated(anon):0kB > isolated(file):0kB mapped:343036kB dirty:0kB writeback:0kB shmem:463544kB > shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1255424kB writeback_tmp:0kB > kernel_stack:2616kB pagetables:12324kB sec_pagetables:0kB all_unreclaimable? > yes > [ 9426.956242] Node 0 DMA free:14336kB boost:0kB min:168kB low:208kB > high:248kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB > active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB > present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB > local_pcp:0kB free_cma:0kB > [ 9426.956247] lowmem_reserve[]: 0 2978 3937 3937 3937 > [ 9426.956254] Node 0 DMA32 free:37212kB boost:0kB min:33784kB low:42228kB > high:50672kB reserved_highatomic:0KB active_anon:1310264kB > inactive_anon:1698556kB active_file:0kB inactive_file:0kB unevictable:0kB > writepending:504kB present:3129192kB managed:3063656kB mlocked:0kB bounce:0kB > free_pcp:0kB local_pcp:0kB free_cma:0kB > [ 9426.956258] lowmem_reserve[]: 0 0 959 959 959 > [ 9426.956262] Node 0 Normal free:10724kB boost:0kB min:10880kB low:13600kB > high:16320kB reserved_highatomic:0KB active_anon:117300kB > inactive_anon:791520kB active_file:0kB inactive_file:0kB unevictable:0kB > writepending:0kB present:1048576kB managed:982364kB mlocked:0kB bounce:0kB > free_pcp:0kB local_pcp:0kB free_cma:0kB > [ 9426.956266] lowmem_reserve[]: 0 0 0 0 0 > [ 9426.956269] Node 1 Normal free:53340kB boost:0kB min:45268kB low:56584kB > high:67900kB reserved_highatomic:8192KB active_anon:238956kB > inactive_anon:3725476kB active_file:128kB inactive_file:372kB unevictable:0kB > writepending:0kB present:4194304kB managed:4094912kB mlocked:0kB bounce:0kB > free_pcp:0kB local_pcp:0kB free_cma:0kB > [ 9426.956274] lowmem_reserve[]: 0 0 0 0 0 > [ 9426.956277] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB > 0*512kB 0*1024kB 1*2048kB (M) 3*4096kB (M) = 14336kB > [ 9426.956288] Node 0 DMA32: 150*4kB (UME) 214*8kB (UME) 164*16kB (UME) > 137*32kB (UME) 61*64kB (UME) 34*128kB (UE) 8*256kB (UE) 3*512kB (ME) 4*1024kB > (U) 0*2048kB 3*4096kB (M) = 37544kB > [ 9426.956302] Node 0 Normal: 257*4kB (UME) 173*8kB (UME) 86*16kB (UM) > 39*32kB (UME) 17*64kB (UE) 5*128kB (UE) 11*256kB (UM) 3*512kB (M) 0*1024kB > 0*2048kB 0*4096kB = 11116kB > [ 9426.956315] Node 1 Normal: 193*4kB (UME) 90*8kB (UME) 67*16kB (UME) > 53*32kB (UME) 33*64kB (UME) 21*128kB (UME) 8*256kB (UME) 3*512kB (U) > 16*1024kB (ME) 6*2048kB (M) 3*4096kB (M) = 53604kB > [ 9426.956331] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 > hugepages_size=1048576kB > [ 9426.956336] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 > hugepages_size=2048kB > [ 9426.956337] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 > hugepages_size=1048576kB > [ 9426.956341] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 > hugepages_size=2048kB > [ 9426.956342] 192402 total pagecache pages > [ 9426.956344] 47459 pages in swap cache > [ 9426.956345] Free swap = 0kB > [ 9426.956345] Total swap = 1046524kB > [ 9426.956346] 2097016 pages RAM > [ 9426.956347] 0 pages HighMem/MovableOnly > [ 9426.956348] 57943 pages reserved > [ 9426.956349] Tasks state (memory values in pages): > [ 9426.956350] [ pid ] uid tgid total_vm rss pgtables_bytes > swapents oom_score_adj name > [ 9426.956355] [ 271] 0 271 12384 305 102400 8 > -250 systemd-journal > [ 9426.956359] [ 297] 0 297 6134 191 69632 225 > -1000 systemd-udevd > [ 9426.956362] [ 516] 106 516 1970 36 57344 81 > 0 rpcbind > [ 9426.956365] [ 517] 101 517 22521 234 77824 0 > 0 systemd-timesyn > [ 9426.956367] [ 520] 0 520 45332 776 94208 17 > 0 rpc.gssd > [ 9426.956369] [ 532] 104 532 2324 147 65536 9 > -900 dbus-daemon > [ 9426.956371] [ 541] 0 541 20059 38 57344 34 > 0 qemu-ga > [ 9426.956374] [ 542] 0 542 7571 518 94208 13 > 0 sssd > [ 9426.956376] [ 595] 0 595 9813 1447 114688 343 > 0 sssd_be > [ 9426.956378] [ 598] 0 598 4249 257 73728 14 > 0 systemd-logind > [ 9426.956380] [ 714] 110 714 8401 969 90112 3 > 0 snmpd > [ 9426.956382] [ 723] 114 723 16470 1280 118784 521 > 0 redis-server > [ 9426.956385] [ 726] 0 726 27035 725 110592 2140 > 0 unattended-upgr > [ 9426.956387] [ 728] 0 728 15886 447 176128 63 > 0 sssd_nss > [ 9426.956390] [ 734] 0 734 3859 323 73728 0 > -1000 sshd > [ 9426.956392] [ 765] 113 765 557402 15290 335872 434 > -900 postgres > [ 9426.956395] [ 850] 0 850 1636 32 57344 36 > 0 cron > [ 9426.956397] [ 860] 0 860 723 21 40960 0 > 0 agetty > [ 9426.956399] [ 888] 0 888 5962 482 86016 564 > 0 apache2 > [ 9426.956402] [ 889] 33 889 8667 316 94208 619 > 0 apache2 > [ 9426.956404] [ 891] 33 891 60947 1947 176128 313 > 0 apache2 > [ 9426.956406] [ 892] 33 892 60947 1919 176128 306 > 0 apache2 > [ 9426.956408] [ 893] 33 893 60949 424 176128 1803 > 0 apache2 > [ 9426.956410] [ 894] 33 894 60947 715 176128 1517 > 0 apache2 > [ 9426.956412] [ 895] 33 895 559609 26749 667648 7292 > 0 apache2 > [ 9426.956414] [ 896] 33 896 2145912 1488816 13746176 148163 > 0 apache2 > [ 9426.956417] [ 1122] 0 1122 10665 49 61440 108 > 0 master > [ 9426.956419] [ 1124] 109 1124 13612 165 81920 5 > 0 qmgr > [ 9426.956421] [ 1126] 0 1126 2360 86 49152 0 > 0 xinetd > [ 9426.956423] [ 1127] 113 1127 557503 4714 212992 414 > 0 postgres > [ 9426.956425] [ 1128] 113 1128 557469 4574 192512 468 > 0 postgres > [ 9426.956427] [ 1130] 113 1130 557402 4414 163840 411 > 0 postgres > [ 9426.956429] [ 1131] 113 1131 557964 475 180224 520 > 0 postgres > [ 9426.956431] [ 1132] 113 1132 557968 473 155648 414 > 0 postgres > [ 9426.956434] [ 1153] 38 1153 26307 829 249856 18104 > 0 python3 > [ 9426.956436] [ 1158] 113 1158 558152 2577 217088 364 > 0 postgres > [ 9426.956446] [ 1160] 38 1160 26309 5812 253952 13339 > 0 python3 > [ 9426.956449] [ 1161] 38 1161 26735 18863 253952 925 > 0 python3 > [ 9426.956452] [ 1162] 38 1162 26308 14930 249856 4313 > 0 python3 > [ 9426.956456] [ 1163] 38 1163 26307 16372 249856 2792 > 0 python3 > [ 9426.956459] [ 1164] 38 1164 44745 15879 258048 3627 > 0 python3 > [ 9426.956462] [ 1165] 38 1165 26305 16597 253952 2619 > 0 python3 > [ 9426.956468] [ 1166] 38 1166 26346 14135 249856 5103 > 0 python3 > [ 9426.956471] [ 1167] 38 1167 26308 17605 245760 1549 > 0 python3 > [ 9426.956476] [ 1168] 38 1168 28363 4894 262144 15474 > 0 python3 > [ 9426.956479] [ 1169] 38 1169 26309 18356 253952 821 > 0 python3 > [ 9426.956482] [ 1170] 38 1170 26731 15264 249856 4331 > 0 python3 > [ 9426.956485] [ 1171] 38 1171 26308 16287 249856 2862 > 0 python3 > [ 9426.956488] [ 1172] 38 1172 26306 19144 249856 0 > 0 python3 > [ 9426.956491] [ 1173] 0 1173 644 23 49152 0 > 0 mailman-web > [ 9426.956494] [ 1175] 0 1175 5121 59 61440 60 > 0 su > [ 9426.956497] [ 1180] 33 1180 4755 424 86016 0 > 100 systemd > [ 9426.956500] [ 1181] 33 1181 45172 512 94208 366 > 100 (sd-pam) > [ 9426.956503] [ 1196] 33 1196 644 22 45056 0 > 0 sh > [ 9426.956506] [ 1197] 33 1197 26303 19246 241664 0 > 0 python3 > [ 9426.956509] [ 1198] 113 1198 558152 2555 217088 370 > 0 postgres > [ 9426.956513] [ 1199] 33 1199 26624 18417 245760 943 > 0 python3 > [ 9426.956516] [ 1200] 33 1200 26317 19245 233472 0 > 0 python3 > [ 9426.956519] [ 1201] 33 1201 26318 19245 233472 0 > 0 python3 > [ 9426.956522] [ 1202] 33 1202 26319 19247 233472 0 > 0 python3 > [ 9426.956525] [ 1203] 33 1203 26320 19246 233472 0 > 0 python3 > [ 9426.956528] [ 1204] 33 1204 26320 19248 233472 0 > 0 python3 > [ 9426.956531] [ 1205] 33 1205 26320 19248 233472 0 > 0 python3 > [ 9426.956534] [ 1208] 113 1208 558152 2595 217088 363 > 0 postgres > [ 9426.956538] [ 1209] 113 1209 558372 4048 233472 363 > 0 postgres > [ 9426.956542] [ 1210] 113 1210 558152 2186 217088 755 > 0 postgres > [ 9426.956545] [ 1211] 113 1211 558281 2614 233472 766 > 0 postgres > [ 9426.956548] [ 1212] 113 1212 558152 2198 217088 759 > 0 postgres > [ 9426.956552] [ 1213] 113 1213 558152 2278 217088 664 > 0 postgres > [ 9426.956555] [ 1215] 113 1215 558152 2595 217088 363 > 0 postgres > [ 9426.956558] [ 1216] 113 1216 558152 2249 217088 707 > 0 postgres > [ 9426.956561] [ 1217] 38 1217 32157 10149 278528 11115 > 0 python3 > [ 9426.956565] [ 1218] 113 1218 558152 2572 217088 370 > 0 postgres > [ 9426.956578] [ 1219] 38 1219 32224 9645 278528 11622 > 0 python3 > [ 9426.956580] [ 1221] 113 1221 558152 2207 217088 766 > 0 postgres > [ 9426.956583] [ 1223] 113 1223 558152 2625 217088 363 > 0 postgres > [ 9426.956585] [ 1233] 0 1233 7321 420 98304 0 > 0 sssd_sudo > [ 9426.956587] [ 1257] 109 1257 15039 373 90112 4 > 0 tlsmgr > [ 9426.956589] [ 1258] 109 1258 13601 170 77824 0 > 0 anvil > [ 9426.956592] [ 3411] 113 3411 558396 4262 266240 361 > 0 postgres > [ 9426.956594] [ 3413] 113 3413 558383 4674 266240 361 > 0 postgres > [ 9426.956596] [ 7672] 109 7672 13601 22 73728 145 > 0 pickup > [ 9426.956599] [ 11282] 109 11282 13601 166 86016 0 > 0 showq > [ 9426.956609] [ 11283] 109 11283 15479 689 94208 0 > 0 smtpd > [ 9426.956612] [ 11285] 113 11285 564050 99241 1200128 361 > 0 postgres > [ 9426.956614] [ 11326] 113 11326 558287 2850 225280 363 > 0 postgres > [ 9426.956616] > oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/system.slice/apache2.service,task=apache2,pid=896,uid=33 > [ 9426.956746] Out of memory: Killed process 896 (apache2) > total-vm:8583648kB, anon-rss:5954788kB, file-rss:0kB, shmem-rss:476kB, UID:33 > pgtables:13424kB oom_score_adj:0 > > This is one of the latest dump. > > This is a virtual machine that is running mailman3 only and nothing else. So > I can rule out anything else affecting this installation. > > The VM has 8 GiB of RAM. Before the upgrade to mailman3, the same machine was > running on only 2 GiB and we kept upping it for mailman3. There is however no > upper bound. If I would add another 8 or even 16 GiB or memory it is only a > matter of time when that will fill up. > > The host is also running redis and PostgreSQL, both exclusively used by > mailman3. Redis is currently using about 7 MiB of memory, and PostgreSQL > about 500-600 MiB. This should however not limit mailman3 at all. manage.py > qcluster is consuming around 90 MiB, and mailman3 itself (i.e. the runner > process) is using another GiB. I just assumed that the web UI would simply > never be using > 5GiB of RAM. Interesting! I would try bumping memory to 16GiB, to see if it improves the situation for you. In our case, it clearly showed, rather conclusively, that the problem was not just "oh, mailman3 is using more memory" but more clearly "wow, there's a problem with uwsgi". In the above stats, it's not entirely clear to me the cause is with Apache: you have a lot going on there, and it *could* actually be there's an issue with the overall memory usage and Apache is just being tagged as the culprit by the OOM... But yeah, your numbers might show there's actually an underlying issue with mailman-web itself. Our tests with gunicorn will more conclusively show whether or not it's the case: if the issue goes away in gunicorn, then this could be an issue in *both* uwsgi and apache2-wsgi... a. -- Never be deceived that the rich will allow you to vote away their wealth. - Lucy Parsons