On Wed, Dec 11, 2019 at 01:58:03PM +0100, Matus UHLAR - fantomas wrote:
>
> My question was, if there's a bug in the bayes code, causing it to eat too
> much of memory.  Both ~750B per token with file-based bayes or ~600B per
> token in redis-based BAYES looks like too much for me.

Not so much a bug, but we should probably add some internal limit to parsed
tokens (10000?) - a normal message would not contain more tokens.  At those
counts the per token memory usage is irrelevant (but we could look at
optimizing it too).  Just need to be careful not to create a loophole for
spammers (filling up few 50k parts with random short tokens, so last part
won't be tokenized at all?)

Created a bug so it won't be forgotten:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7776

Reply via email to