Linda Walsh wrote: > > I see 3 DB's in my user directory (.spamassassin). > > auto-whitelist (~80MB) > bayes_seen (~40MB) > bayes_toks (~20MB) > > Was trying to find relation of 'bayes_expiry_max_db_size' to the physical > size of the above files. expiry will only affect bayes_toks. Currently neither auto-whitelist nor bayes_seen have any expiry mechanism at all.
bayes_seen can safely be deleted if you need to. It keeps track of what messages have already been learned to prevent relearning them. However, unless you're likely to re-feed messages to SA, bayes_seen isn't stictly neccesary. > I'm finding some answers, I've run into some > seeming "contradictions". Had db_size set to 500,000, reduced to 250,000 > and to 'default' (150,000) during testing. > > In trying to lower 'db_size' and see how that affected physical sizes, > I ran sa-learn --force expires and saw these debug messages of 'Note': > > [30905] dbg: bayes: expiry check keep size, 0.75 * max: 112500 > [30905] dbg: bayes: token count: 0, final goal reduction size: -112500 > [30905] dbg: bayes: reduction goal of -112500 is under 1,000 tokens, > skipping expire > [30905] dbg: bayes: expiry completed > > --- > First prob(contradiction). dbg above says "token count: 0". (This is > with > a combined bayes db size of 60MB (_seen, _toks). Are you sure your sa-learn was using the same DB path? >From the sounds of it, sa-learn is using a directory with an empty DB. > > Seems to think I have no bayes data. Saw another dbg msg that > indicated the > bayes classifier was untrained (<~150? entries) & disabled. > > Dunno how it got zeroed, but tried adding 'ham' by running sa-learn over > my a despam'ed mailbox. First run showed: > > Learned tokens from 55 message(s) (55 message(s) examined) > > But subsequent runs of 'sa-learn with dbg+expire" still show token > count: 0. > > sa-learn --dump magic shows something different: > 0.000 0 3 0 non-token data: bayes db version > 0.000 0 556414 0 non-token data: nspam > 0.000 0 574441 0 non-token data: nham > 0.000 0 491743 0 non-token data: ntokens > 0.000 0 1216456288 0 non-token data: oldest atime > 0.000 0 1237796146 0 non-token data: newest atime > 0.000 0 1220476831 0 non-token data: last journal > sync atime > 0.000 0 1217838535 0 non-token data: last expiry atime > 0.000 0 1382400 0 non-token data: last expire > atime delta > 0.000 0 70612 0 non-token data: last expire > reduction count > --------- > > Does the above indicate 0 tokens? I.e. isn't 'ntokens' = 491743 mean > slightly under 500K tokens (my original limit before trying to run > 'sa-learn -expires + dbg' manually). Yep, looks like you have 491,743 tokens to me. > > It's like the sa-learn magic shows a 'db' corresponding to my old limit > (that I think is still being 'auto-expired', so might not have pruned > figure as it runs about once per 24 hours, if I understand normal spamd > workings). Approximately. Also, be aware that in order for spamd to use new settings it needs to be restarted. > > So is the --magic output, maybe what is seen and being > 'size-controlled' by > auto-expire (was ~500K before recent test changes). Yes, at least, it should be. > > Why isn't 'sa-learn --force expire' seeing the TOKENs indicated in > sa-learn --dump magic? That is particularly strange to me, and it sounds like there's some problems there. Can you give a bit of detail, ie: what paths are you looking at for the files, what version of SA, > Debug messages are pointing at the same file > for both operations, so how can dump-magic indicated 500K, but the > debug of sa-learn --force-expire, is somehow seeing 0 TOKENs? > > Am I misinterpreting the debug output? No, you don't seem to be. > > >