On 6 Dec 2019, Henrik K. spake thusly: > On Fri, Dec 06, 2019 at 10:23:15AM +0100, Matus UHLAR - fantomas wrote: >> >On Thu, 5 Dec 2019 17:07:05 +0100 >> >Matus UHLAR - fantomas wrote: >> >>seems some big mails were too long to scan, and SA even got killed. >> >> >> >>[2146809.213586] Out of memory: Kill process 3660 (spamassassin) >> >>score 365 or sacrifice child [2146809.213613] Killed process 3660 >> >>(spamassassin) total-vm:2960664kB, anon-rss:2921892kB, file-rss:0kB, >> >>shmem-rss:0kB [2146809.270342] oom_reaper: reaped process 3660 >> >>(spamassassin), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB >> >> >> >>I see the mail body contains nearly 20MB uuencoded text (don't ask). >> >> >> >>I found some body rules that contain ".*" instead of a sane >> >>quantifier: >> >> >> >>72_active.cf:rawbody __HAS_HREF /^[^>].*?<a href=/im >> >>72_active.cf:rawbody __HAS_HREF_ONECASE /^[^>].*?<(a >> >>href|A HREF)=/m 72_active.cf:rawbody __HAS_IMG_SRC >> >>/^[^>].*?<img src=/im 72_active.cf:rawbody __HAS_IMG_SRC_DATA >> >>/^[^>].*?<img src=['"]data/im 72_active.cf:rawbody >> >>__HAS_IMG_SRC_ONECASE /^[^>].*?<(img src|IMG SRC)=/m >> >> >> >>There are different checks that have the "*" quantifier tho. >> >>Is it reasonable to replace them with {0,1000} globally? >> >> On 05.12.19 17:21, RW wrote: >> >In rawbody rules the text is broken into chunks of 1024 to 2048 bytes, >> >so the worst case isn't all that much worst than with {0,1000}. >> > >> >Also /m means that .* wont cross a line boundary in the decoded text >> >and ^ can match in the middle of the chunk. This make the average >> >processing time less sensitive to any upper limit on .*. >> >> so it is not the quantifiers who cause SA taking too much of memory? >> >> any idea how to debug that? > > Scanning a generic 20MB will normally eat ~700MB memory. 3GB implies > something is bugging. Feel free to send a sample if you can.
Yeah. Similarly, a standard sa-learn run blew up to 96GiB RSS and got oom-killed here, just last night. That this happened to two people simultaneously suggests that something bad crept in in the 4th Dec rule update... I'll see if I can replicate it with -D on. (If sa-learn even pays attention to -D :) ) -- NULL && (void)