Re: SA memory (Re: ".*" in body rules)

Nix Fri, 06 Dec 2019 08:33:09 -0800

On 6 Dec 2019, Henrik K. spake thusly:

> On Fri, Dec 06, 2019 at 10:23:15AM +0100, Matus UHLAR - fantomas wrote:
>> >On Thu, 5 Dec 2019 17:07:05 +0100
>> >Matus UHLAR - fantomas wrote:
>> >>seems some big mails were too long to scan, and SA even got killed.
>> >>
>> >>[2146809.213586] Out of memory: Kill process 3660 (spamassassin)
>> >>score 365 or sacrifice child [2146809.213613] Killed process 3660
>> >>(spamassassin) total-vm:2960664kB, anon-rss:2921892kB, file-rss:0kB,
>> >>shmem-rss:0kB [2146809.270342] oom_reaper: reaped process 3660
>> >>(spamassassin), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
>> >>
>> >>I see the mail body contains nearly 20MB uuencoded text (don't ask).
>> >>
>> >>I found some body rules that contain ".*" instead of a sane
>> >>quantifier:
>> >>
>> >>72_active.cf:rawbody            __HAS_HREF      /^[^>].*?<a href=/im
>> >>72_active.cf:rawbody            __HAS_HREF_ONECASE      /^[^>].*?<(a
>> >>href|A HREF)=/m 72_active.cf:rawbody            __HAS_IMG_SRC
>> >>/^[^>].*?<img src=/im 72_active.cf:rawbody  __HAS_IMG_SRC_DATA
>> >>/^[^>].*?<img src=['"]data/im 72_active.cf:rawbody
>> >>__HAS_IMG_SRC_ONECASE   /^[^>].*?<(img src|IMG SRC)=/m
>> >>
>> >>There are different checks that have the "*" quantifier tho.
>> >>Is it reasonable to replace them with {0,1000} globally?
>> 
>> On 05.12.19 17:21, RW wrote:
>> >In rawbody rules the text is broken into chunks of 1024 to 2048 bytes,
>> >so the worst case isn't all that much worst than with {0,1000}.
>> >
>> >Also  /m means that .* wont cross a line boundary in the decoded text
>> >and  ^ can match in the middle of the chunk. This make the average
>> >processing  time less sensitive to any upper limit on .*.
>> 
>> so it is not the quantifiers who cause SA taking too much of memory?
>> 
>> any idea how to debug that?
>
> Scanning a generic 20MB will normally eat ~700MB memory.  3GB implies
> something is bugging.  Feel free to send a sample if you can.


Yeah. Similarly, a standard sa-learn run blew up to 96GiB RSS and got
oom-killed here, just last night. That this happened to two people
simultaneously suggests that something bad crept in in the 4th Dec rule
update... I'll see if I can replicate it with -D on. (If sa-learn even
pays attention to -D :) )

-- 
NULL && (void)

Re: SA memory (Re: ".*" in body rules)

Reply via email to