On Wed, Dec 11, 2019 at 10:04:56AM +0100, Matus UHLAR - fantomas wrote:
>>hmmm, the machine has 4G of RAM and SA now takes 4.5.
>>The check rund out of time but produces ~450K debug file.
>>
>>This is where it hangs:
>>
>>Dec 10 17:43:51.727 [9721] dbg: bayes: tokenized header: 211 tokens

On 10.12.19 22:52, RW wrote:
>What are the full counts if you put it through 'grep tokenized'

Dec 10 17:43:49.137 [9721] dbg: bayes: tokenized body: 6158242 tokens
Dec 10 17:43:51.713 [9721] dbg: bayes: tokenized uri: 10881 tokens

On 11.12.19 11:43, Henrik K wrote:
Wow 6 million tokens.. :-)

I assume the big uuencoded blob content-type is text/* since it's tokenized?

yes, I mentioned that in previous mails. ~15M file, uuencoded in ~20M mail.

grep -c '^M' spamassassin-memory-error-<...>
329312

One of former mails mentioned that 20M mail should use ~700M of RAM. 6M tokens eating about 4G of RAM means ~750B per token, is that fine?

This will be mitigated in 3.4.3, since it will only use max 50k of the body
text (body_part_scan_size).

will it prefer test parts and try to avoid uuencoded or base64 parts?
(or maybe decode them?)

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Enter any 12-digit prime number to continue.

Reply via email to