On Wed, Dec 11, 2019 at 10:04:56AM +0100, Matus UHLAR - fantomas wrote:
> >>hmmm, the machine has 4G of RAM and SA now takes 4.5.
> >>The check rund out of time but produces ~450K debug file.
> >>
> >>This is where it hangs:
> >>
> >>Dec 10 17:43:51.727 [9721] dbg: bayes: tokenized header: 211 tokens
> 
> On 10.12.19 22:52, RW wrote:
> >What are the full counts if you put it through 'grep tokenized'
> 
> Dec 10 17:43:49.137 [9721] dbg: bayes: tokenized body: 6158242 tokens
> Dec 10 17:43:51.713 [9721] dbg: bayes: tokenized uri: 10881 tokens

Wow 6 million tokens.. :-)

I assume the big uuencoded blob content-type is text/* since it's tokenized?

This will be mitigated in 3.4.3, since it will only use max 50k of the body
text (body_part_scan_size).

Reply via email to