>On Wed, Dec 11, 2019 at 10:53:04AM +0100, Matus UHLAR - fantomas wrote:
>>On 11.12.19 11:43, Henrik K wrote:
>>>Wow 6 million tokens.. :-)
>>>
>>>I assume the big uuencoded blob content-type is text/* since it's tokenized?
>>yes, I mentioned that in previous mails. ~15M file, uuencoded in ~20M mail.
>>
>>grep -c '^M' spamassassin-memory-error-<...>
>>329312
>>
>>One of former mails mentioned that 20M mail should use ~700M of RAM. 6M
>>tokens eating about 4G of RAM means ~750B per token, is that fine?
On 11.12.19 12:07, Henrik K wrote:
>I'm pretty sure the Bayes code does many dumb things with the tokens
>that result in much memory usage for abnormal cases like this.
On Wed, Dec 11, 2019 at 01:12:46PM +0100, Matus UHLAR - fantomas wrote:
but apparently nobody notices...
On 11.12.19 14:22, Henrik K wrote:
How many people even scan 20MB mails? Pretty much nobody. It's not safe to
do until SA 3.4.3 version as you can see. Before this, I know atleast
Amavisd-new could be configured to truncate large messages before feeding to
SA, which was somewhat safe to do.
I have raised the limits years ago to see how it goes. During the time, I
have received multiple many-MB spams, most of them hit BAYES_99, and without
it they would became FNs.
This is about the second time it caused problems - the first first time
happened on very slow machine, scanning took too much time.
My question was, if there's a bug in the bayes code, causing it to eat too
much of memory. Both ~750B per token with file-based bayes or ~600B per
token in redis-based BAYES looks like too much for me.
--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
One OS to rule them all, One OS to find them,
One OS to bring them all and into darkness bind them