Re: SA memory (Re: ".*" in body rules)

Matus UHLAR - fantomas Wed, 11 Dec 2019 04:58:17 -0800

>On Wed, Dec 11, 2019 at 10:53:04AM +0100, Matus UHLAR - fantomas wrote:
>>On 11.12.19 11:43, Henrik K wrote:
>>>Wow 6 million tokens.. :-)
>>>
>>>I assume the big uuencoded blob content-type is text/* since it's tokenized?


>>yes, I mentioned that in previous mails. ~15M file, uuencoded in ~20M mail.
>>
>>grep -c '^M' spamassassin-memory-error-<...>
>>329312
>>
>>One of former mails mentioned that 20M mail should use ~700M of RAM. 6M
>>tokens eating about 4G of RAM means ~750B per token, is that fine?

On 11.12.19 12:07, Henrik K wrote:
>I'm pretty sure the Bayes code does many dumb things with the tokens
>that result in much memory usage for abnormal cases like this.

On Wed, Dec 11, 2019 at 01:12:46PM +0100, Matus UHLAR - fantomas wrote:

but apparently nobody notices...


On 11.12.19 14:22, Henrik K wrote:

How many people even scan 20MB mails?  Pretty much nobody.  It's not safe to
do until SA 3.4.3 version as you can see.  Before this, I know atleast
Amavisd-new could be configured to truncate large messages before feeding to
SA, which was somewhat safe to do.


I have raised the limits years ago to see how it goes.  During the time, I
have received multiple many-MB spams, most of them hit BAYES_99, and without
it they would became FNs.

This is about the second time it caused problems - the first first time
happened on very slow machine, scanning took too much time.

My question was, if there's a bug in the bayes code, causing it to eat too
much of memory.  Both ~750B per token with file-based bayes or ~600B per
token in redis-based BAYES looks like too much for me.


--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
  One OS to rule them all, One OS to find them,
One OS to bring them all and into darkness bind them

Re: SA memory (Re: ".*" in body rules)

Reply via email to