On 2025-01-15 15:56:27, Michael Tremer wrote:

[...]

>>> I would be happy to hear if running mailman3 in Gunicorn resolves the 
>>> problem, but maybe it is just a coincidence that the problem doesn’t appear 
>>> there?
>> 
>> It could be! If you could show us OOM dmesg logs, they should show which
>> process was actually using memory when the OOM happens, this should
>> inform next steps pretty well.
>
> [ 9426.955921] apache2 invoked oom-killer: 
> gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
> [ 9426.955960] CPU: 3 PID: 1037 Comm: apache2 Not tainted 
> 6.1.0-30-cloud-amd64 #1  Debian 6.1.124-1
> [ 9426.955967] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> [ 9426.955970] Call Trace:
> [ 9426.955988]  <TASK>
> [ 9426.956000]  dump_stack_lvl+0x44/0x5c
> [ 9426.956042]  dump_header+0x4a/0x211
> [ 9426.956053]  oom_kill_process.cold+0xb/0x10
> [ 9426.956056]  out_of_memory+0x1fd/0x4c0
> [ 9426.956076]  __alloc_pages_slowpath.constprop.0+0xc83/0xe40
> [ 9426.956087]  __alloc_pages+0x305/0x330
> [ 9426.956089]  folio_alloc+0x17/0x50
> [ 9426.956097]  __filemap_get_folio+0x155/0x340
> [ 9426.956100]  filemap_fault+0x139/0x910
> [ 9426.956104]  __do_fault+0x31/0x80
> [ 9426.956122]  do_fault+0x1b9/0x410
> [ 9426.956125]  __handle_mm_fault+0x660/0xfa0
> [ 9426.956130]  handle_mm_fault+0xdb/0x2d0
> [ 9426.956133]  do_user_addr_fault+0x191/0x550
> [ 9426.956148]  exc_page_fault+0x70/0x170
> [ 9426.956157]  asm_exc_page_fault+0x22/0x30
> [ 9426.956179] RIP: 0033:0x7fdf79dcb0be
> [ 9426.956196] Code: Unable to access opcode bytes at 0x7fdf79dcb094.
> [ 9426.956198] RSP: 002b:00007fdf63ffe140 EFLAGS: 00010207
> [ 9426.956203] RAX: 00007fded3700000 RBX: 0000000000000006 RCX: 
> 00007fdf7afe1923
> [ 9426.956208] RDX: 0000000000000003 RSI: 0000000000100000 RDI: 
> 0000000000000000
> [ 9426.956210] RBP: 0000000000000000 R08: 00000000ffffffff R09: 
> 0000000000000000
> [ 9426.956211] R10: 0000000000000022 R11: 0000000000000246 R12: 
> 000055aad8cd6670
> [ 9426.956212] R13: 00007fded38fb190 R14: 0000000002e4693b R15: 
> 00007fdd58083010
> [ 9426.956221]  </TASK>
> [ 9426.956222] Mem-Info:
> [ 9426.956227] active_anon:416695 inactive_anon:1553950 isolated_anon:512
>                 active_file:2 inactive_file:12 isolated_file:0
>                 unevictable:0 dirty:0 writeback:37
>                 slab_reclaimable:8546 slab_unreclaimable:9125
>                 mapped:106341 shmem:144817 pagetables:7111
>                 sec_pagetables:0 bounce:0
>                 kernel_misc_reclaimable:0
>                 free:28903 free_pcp:0 free_cma:0
> [ 9426.956233] Node 0 active_anon:1427824kB inactive_anon:2490352kB 
> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):2048kB 
> isolated(file):0kB mapped:82328kB dirty:0kB writeback:148kB shmem:115724kB 
> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 960512kB writeback_tmp:0kB 
> kernel_stack:3320kB pagetables:16120kB sec_pagetables:0kB all_unreclaimable? 
> no
> [ 9426.956238] Node 1 active_anon:238956kB inactive_anon:3725448kB 
> active_file:16kB inactive_file:108kB unevictable:0kB isolated(anon):0kB 
> isolated(file):0kB mapped:343036kB dirty:0kB writeback:0kB shmem:463544kB 
> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1255424kB writeback_tmp:0kB 
> kernel_stack:2616kB pagetables:12324kB sec_pagetables:0kB all_unreclaimable? 
> yes
> [ 9426.956242] Node 0 DMA free:14336kB boost:0kB min:168kB low:208kB 
> high:248kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB 
> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB 
> present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB 
> local_pcp:0kB free_cma:0kB
> [ 9426.956247] lowmem_reserve[]: 0 2978 3937 3937 3937
> [ 9426.956254] Node 0 DMA32 free:37212kB boost:0kB min:33784kB low:42228kB 
> high:50672kB reserved_highatomic:0KB active_anon:1310264kB 
> inactive_anon:1698556kB active_file:0kB inactive_file:0kB unevictable:0kB 
> writepending:504kB present:3129192kB managed:3063656kB mlocked:0kB bounce:0kB 
> free_pcp:0kB local_pcp:0kB free_cma:0kB
> [ 9426.956258] lowmem_reserve[]: 0 0 959 959 959
> [ 9426.956262] Node 0 Normal free:10724kB boost:0kB min:10880kB low:13600kB 
> high:16320kB reserved_highatomic:0KB active_anon:117300kB 
> inactive_anon:791520kB active_file:0kB inactive_file:0kB unevictable:0kB 
> writepending:0kB present:1048576kB managed:982364kB mlocked:0kB bounce:0kB 
> free_pcp:0kB local_pcp:0kB free_cma:0kB
> [ 9426.956266] lowmem_reserve[]: 0 0 0 0 0
> [ 9426.956269] Node 1 Normal free:53340kB boost:0kB min:45268kB low:56584kB 
> high:67900kB reserved_highatomic:8192KB active_anon:238956kB 
> inactive_anon:3725476kB active_file:128kB inactive_file:372kB unevictable:0kB 
> writepending:0kB present:4194304kB managed:4094912kB mlocked:0kB bounce:0kB 
> free_pcp:0kB local_pcp:0kB free_cma:0kB
> [ 9426.956274] lowmem_reserve[]: 0 0 0 0 0
> [ 9426.956277] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 
> 0*512kB 0*1024kB 1*2048kB (M) 3*4096kB (M) = 14336kB
> [ 9426.956288] Node 0 DMA32: 150*4kB (UME) 214*8kB (UME) 164*16kB (UME) 
> 137*32kB (UME) 61*64kB (UME) 34*128kB (UE) 8*256kB (UE) 3*512kB (ME) 4*1024kB 
> (U) 0*2048kB 3*4096kB (M) = 37544kB
> [ 9426.956302] Node 0 Normal: 257*4kB (UME) 173*8kB (UME) 86*16kB (UM) 
> 39*32kB (UME) 17*64kB (UE) 5*128kB (UE) 11*256kB (UM) 3*512kB (M) 0*1024kB 
> 0*2048kB 0*4096kB = 11116kB
> [ 9426.956315] Node 1 Normal: 193*4kB (UME) 90*8kB (UME) 67*16kB (UME) 
> 53*32kB (UME) 33*64kB (UME) 21*128kB (UME) 8*256kB (UME) 3*512kB (U) 
> 16*1024kB (ME) 6*2048kB (M) 3*4096kB (M) = 53604kB
> [ 9426.956331] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
> hugepages_size=1048576kB
> [ 9426.956336] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
> hugepages_size=2048kB
> [ 9426.956337] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
> hugepages_size=1048576kB
> [ 9426.956341] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
> hugepages_size=2048kB
> [ 9426.956342] 192402 total pagecache pages
> [ 9426.956344] 47459 pages in swap cache
> [ 9426.956345] Free swap  = 0kB
> [ 9426.956345] Total swap = 1046524kB
> [ 9426.956346] 2097016 pages RAM
> [ 9426.956347] 0 pages HighMem/MovableOnly
> [ 9426.956348] 57943 pages reserved
> [ 9426.956349] Tasks state (memory values in pages):
> [ 9426.956350] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes 
> swapents oom_score_adj name
> [ 9426.956355] [    271]     0   271    12384      305   102400        8      
>     -250 systemd-journal
> [ 9426.956359] [    297]     0   297     6134      191    69632      225      
>    -1000 systemd-udevd
> [ 9426.956362] [    516]   106   516     1970       36    57344       81      
>        0 rpcbind
> [ 9426.956365] [    517]   101   517    22521      234    77824        0      
>        0 systemd-timesyn
> [ 9426.956367] [    520]     0   520    45332      776    94208       17      
>        0 rpc.gssd
> [ 9426.956369] [    532]   104   532     2324      147    65536        9      
>     -900 dbus-daemon
> [ 9426.956371] [    541]     0   541    20059       38    57344       34      
>        0 qemu-ga
> [ 9426.956374] [    542]     0   542     7571      518    94208       13      
>        0 sssd
> [ 9426.956376] [    595]     0   595     9813     1447   114688      343      
>        0 sssd_be
> [ 9426.956378] [    598]     0   598     4249      257    73728       14      
>        0 systemd-logind
> [ 9426.956380] [    714]   110   714     8401      969    90112        3      
>        0 snmpd
> [ 9426.956382] [    723]   114   723    16470     1280   118784      521      
>        0 redis-server
> [ 9426.956385] [    726]     0   726    27035      725   110592     2140      
>        0 unattended-upgr
> [ 9426.956387] [    728]     0   728    15886      447   176128       63      
>        0 sssd_nss
> [ 9426.956390] [    734]     0   734     3859      323    73728        0      
>    -1000 sshd
> [ 9426.956392] [    765]   113   765   557402    15290   335872      434      
>     -900 postgres
> [ 9426.956395] [    850]     0   850     1636       32    57344       36      
>        0 cron
> [ 9426.956397] [    860]     0   860      723       21    40960        0      
>        0 agetty
> [ 9426.956399] [    888]     0   888     5962      482    86016      564      
>        0 apache2
> [ 9426.956402] [    889]    33   889     8667      316    94208      619      
>        0 apache2
> [ 9426.956404] [    891]    33   891    60947     1947   176128      313      
>        0 apache2
> [ 9426.956406] [    892]    33   892    60947     1919   176128      306      
>        0 apache2
> [ 9426.956408] [    893]    33   893    60949      424   176128     1803      
>        0 apache2
> [ 9426.956410] [    894]    33   894    60947      715   176128     1517      
>        0 apache2
> [ 9426.956412] [    895]    33   895   559609    26749   667648     7292      
>        0 apache2
> [ 9426.956414] [    896]    33   896  2145912  1488816 13746176   148163      
>        0 apache2
> [ 9426.956417] [   1122]     0  1122    10665       49    61440      108      
>        0 master
> [ 9426.956419] [   1124]   109  1124    13612      165    81920        5      
>        0 qmgr
> [ 9426.956421] [   1126]     0  1126     2360       86    49152        0      
>        0 xinetd
> [ 9426.956423] [   1127]   113  1127   557503     4714   212992      414      
>        0 postgres
> [ 9426.956425] [   1128]   113  1128   557469     4574   192512      468      
>        0 postgres
> [ 9426.956427] [   1130]   113  1130   557402     4414   163840      411      
>        0 postgres
> [ 9426.956429] [   1131]   113  1131   557964      475   180224      520      
>        0 postgres
> [ 9426.956431] [   1132]   113  1132   557968      473   155648      414      
>        0 postgres
> [ 9426.956434] [   1153]    38  1153    26307      829   249856    18104      
>        0 python3
> [ 9426.956436] [   1158]   113  1158   558152     2577   217088      364      
>        0 postgres
> [ 9426.956446] [   1160]    38  1160    26309     5812   253952    13339      
>        0 python3
> [ 9426.956449] [   1161]    38  1161    26735    18863   253952      925      
>        0 python3
> [ 9426.956452] [   1162]    38  1162    26308    14930   249856     4313      
>        0 python3
> [ 9426.956456] [   1163]    38  1163    26307    16372   249856     2792      
>        0 python3
> [ 9426.956459] [   1164]    38  1164    44745    15879   258048     3627      
>        0 python3
> [ 9426.956462] [   1165]    38  1165    26305    16597   253952     2619      
>        0 python3
> [ 9426.956468] [   1166]    38  1166    26346    14135   249856     5103      
>        0 python3
> [ 9426.956471] [   1167]    38  1167    26308    17605   245760     1549      
>        0 python3
> [ 9426.956476] [   1168]    38  1168    28363     4894   262144    15474      
>        0 python3
> [ 9426.956479] [   1169]    38  1169    26309    18356   253952      821      
>        0 python3
> [ 9426.956482] [   1170]    38  1170    26731    15264   249856     4331      
>        0 python3
> [ 9426.956485] [   1171]    38  1171    26308    16287   249856     2862      
>        0 python3
> [ 9426.956488] [   1172]    38  1172    26306    19144   249856        0      
>        0 python3
> [ 9426.956491] [   1173]     0  1173      644       23    49152        0      
>        0 mailman-web
> [ 9426.956494] [   1175]     0  1175     5121       59    61440       60      
>        0 su
> [ 9426.956497] [   1180]    33  1180     4755      424    86016        0      
>      100 systemd
> [ 9426.956500] [   1181]    33  1181    45172      512    94208      366      
>      100 (sd-pam)
> [ 9426.956503] [   1196]    33  1196      644       22    45056        0      
>        0 sh
> [ 9426.956506] [   1197]    33  1197    26303    19246   241664        0      
>        0 python3
> [ 9426.956509] [   1198]   113  1198   558152     2555   217088      370      
>        0 postgres
> [ 9426.956513] [   1199]    33  1199    26624    18417   245760      943      
>        0 python3
> [ 9426.956516] [   1200]    33  1200    26317    19245   233472        0      
>        0 python3
> [ 9426.956519] [   1201]    33  1201    26318    19245   233472        0      
>        0 python3
> [ 9426.956522] [   1202]    33  1202    26319    19247   233472        0      
>        0 python3
> [ 9426.956525] [   1203]    33  1203    26320    19246   233472        0      
>        0 python3
> [ 9426.956528] [   1204]    33  1204    26320    19248   233472        0      
>        0 python3
> [ 9426.956531] [   1205]    33  1205    26320    19248   233472        0      
>        0 python3
> [ 9426.956534] [   1208]   113  1208   558152     2595   217088      363      
>        0 postgres
> [ 9426.956538] [   1209]   113  1209   558372     4048   233472      363      
>        0 postgres
> [ 9426.956542] [   1210]   113  1210   558152     2186   217088      755      
>        0 postgres
> [ 9426.956545] [   1211]   113  1211   558281     2614   233472      766      
>        0 postgres
> [ 9426.956548] [   1212]   113  1212   558152     2198   217088      759      
>        0 postgres
> [ 9426.956552] [   1213]   113  1213   558152     2278   217088      664      
>        0 postgres
> [ 9426.956555] [   1215]   113  1215   558152     2595   217088      363      
>        0 postgres
> [ 9426.956558] [   1216]   113  1216   558152     2249   217088      707      
>        0 postgres
> [ 9426.956561] [   1217]    38  1217    32157    10149   278528    11115      
>        0 python3
> [ 9426.956565] [   1218]   113  1218   558152     2572   217088      370      
>        0 postgres
> [ 9426.956578] [   1219]    38  1219    32224     9645   278528    11622      
>        0 python3
> [ 9426.956580] [   1221]   113  1221   558152     2207   217088      766      
>        0 postgres
> [ 9426.956583] [   1223]   113  1223   558152     2625   217088      363      
>        0 postgres
> [ 9426.956585] [   1233]     0  1233     7321      420    98304        0      
>        0 sssd_sudo
> [ 9426.956587] [   1257]   109  1257    15039      373    90112        4      
>        0 tlsmgr
> [ 9426.956589] [   1258]   109  1258    13601      170    77824        0      
>        0 anvil
> [ 9426.956592] [   3411]   113  3411   558396     4262   266240      361      
>        0 postgres
> [ 9426.956594] [   3413]   113  3413   558383     4674   266240      361      
>        0 postgres
> [ 9426.956596] [   7672]   109  7672    13601       22    73728      145      
>        0 pickup
> [ 9426.956599] [  11282]   109 11282    13601      166    86016        0      
>        0 showq
> [ 9426.956609] [  11283]   109 11283    15479      689    94208        0      
>        0 smtpd
> [ 9426.956612] [  11285]   113 11285   564050    99241  1200128      361      
>        0 postgres
> [ 9426.956614] [  11326]   113 11326   558287     2850   225280      363      
>        0 postgres
> [ 9426.956616] 
> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/system.slice/apache2.service,task=apache2,pid=896,uid=33
> [ 9426.956746] Out of memory: Killed process 896 (apache2) 
> total-vm:8583648kB, anon-rss:5954788kB, file-rss:0kB, shmem-rss:476kB, UID:33 
> pgtables:13424kB oom_score_adj:0
>
> This is one of the latest dump.
>
> This is a virtual machine that is running mailman3 only and nothing else. So 
> I can rule out anything else affecting this installation.
>
> The VM has 8 GiB of RAM. Before the upgrade to mailman3, the same machine was 
> running on only 2 GiB and we kept upping it for mailman3. There is however no 
> upper bound. If I would add another 8 or even 16 GiB or memory it is only a 
> matter of time when that will fill up.
>
> The host is also running redis and PostgreSQL, both exclusively used by 
> mailman3. Redis is currently using about 7 MiB of memory, and PostgreSQL 
> about 500-600 MiB. This should however not limit mailman3 at all. manage.py 
> qcluster is consuming around 90 MiB, and mailman3 itself (i.e. the runner 
> process) is using another GiB. I just assumed that the web UI would simply 
> never be using > 5GiB of RAM.

Interesting!

I would try bumping memory to 16GiB, to see if it improves the situation
for you. In our case, it clearly showed, rather conclusively, that the
problem was not just "oh, mailman3 is using more memory" but more
clearly "wow, there's a problem with uwsgi".

In the above stats, it's not entirely clear to me the cause is with
Apache: you have a lot going on there, and it *could* actually be
there's an issue with the overall memory usage and Apache is just being
tagged as the culprit by the OOM...

But yeah, your numbers might show there's actually an underlying issue
with mailman-web itself. Our tests with gunicorn will more conclusively
show whether or not it's the case: if the issue goes away in gunicorn,
then this could be an issue in *both* uwsgi and apache2-wsgi...

a.

-- 
Never be deceived that the rich will allow you to vote away their wealth.
                        - Lucy Parsons

Reply via email to