On Wed, Dec 11, 2019 at 10:04:56AM +0100, Matus UHLAR - fantomas wrote: > >>hmmm, the machine has 4G of RAM and SA now takes 4.5. > >>The check rund out of time but produces ~450K debug file. > >> > >>This is where it hangs: > >> > >>Dec 10 17:43:51.727 [9721] dbg: bayes: tokenized header: 211 tokens > > On 10.12.19 22:52, RW wrote: > >What are the full counts if you put it through 'grep tokenized' > > Dec 10 17:43:49.137 [9721] dbg: bayes: tokenized body: 6158242 tokens > Dec 10 17:43:51.713 [9721] dbg: bayes: tokenized uri: 10881 tokens
Wow 6 million tokens.. :-) I assume the big uuencoded blob content-type is text/* since it's tokenized? This will be mitigated in 3.4.3, since it will only use max 50k of the body text (body_part_scan_size).
