Re: user-db size, content confusions (how many toks?)

Matt Kettler Sun, 29 Mar 2009 18:19:36 -0700

Linda Walsh wrote:
>
> I see 3 DB's in my user directory (.spamassassin).
>
> auto-whitelist    (~80MB)
> bayes_seen    (~40MB)
> bayes_toks    (~20MB)
>
> Was trying to find relation of 'bayes_expiry_max_db_size' to the physical
> size of the above files.
expiry will only affect bayes_toks. Currently neither auto-whitelist nor
bayes_seen have any expiry mechanism at all.


bayes_seen can safely be deleted if you need to. It keeps track of what
messages have already been learned to prevent relearning them. However,
unless you're likely to re-feed messages to SA, bayes_seen isn't stictly
neccesary.


>   I'm finding some answers, I've run into some
> seeming "contradictions".  Had db_size set to 500,000, reduced to 250,000
> and to 'default' (150,000) during testing.
>
> In trying to lower 'db_size' and see how that affected physical sizes,
> I ran sa-learn --force expires and saw these debug messages of 'Note':
>
> [30905] dbg: bayes: expiry check keep size, 0.75 * max: 112500
> [30905] dbg: bayes: token count: 0, final goal reduction size: -112500
> [30905] dbg: bayes: reduction goal of -112500 is under 1,000 tokens,
> skipping expire
> [30905] dbg: bayes: expiry completed
>
> ---
> First prob(contradiction).  dbg above says "token count: 0".  (This is
> with
> a combined bayes db size of 60MB (_seen, _toks).
Are you sure your sa-learn was using the same DB path?

>From the sounds of it, sa-learn is using a directory with an empty DB.

>
> Seems to think I have no bayes data.  Saw another dbg msg that
> indicated the
> bayes classifier was untrained (<~150? entries) & disabled.
>
> Dunno how it got zeroed, but tried adding 'ham' by running sa-learn over
> my a despam'ed mailbox.  First run showed:
>
> Learned tokens from 55 message(s) (55 message(s) examined)
>
> But subsequent runs of 'sa-learn with dbg+expire" still show token
> count: 0.
>
> sa-learn --dump magic shows something different:
> 0.000          0          3          0  non-token data: bayes db version
> 0.000          0     556414          0  non-token data: nspam
> 0.000          0     574441          0  non-token data: nham
> 0.000          0     491743          0  non-token data: ntokens
> 0.000          0 1216456288          0  non-token data: oldest atime
> 0.000          0 1237796146          0  non-token data: newest atime
> 0.000          0 1220476831          0  non-token data: last journal
> sync atime
> 0.000          0 1217838535          0  non-token data: last expiry atime
> 0.000          0    1382400          0  non-token data: last expire
> atime delta
> 0.000          0      70612          0  non-token data: last expire
> reduction count
> ---------
>
> Does the above indicate 0 tokens?  I.e. isn't 'ntokens' = 491743 mean
> slightly under 500K tokens (my original limit before trying to run
> 'sa-learn -expires + dbg' manually).
Yep, looks like you have 491,743 tokens to me.
>
> It's like the sa-learn magic shows a 'db' corresponding to my old limit
> (that I think is still being 'auto-expired', so might not have pruned
> figure as it runs about once per 24 hours, if I understand normal spamd
> workings).
Approximately. Also, be aware that in order for spamd to use new
settings it needs to be restarted.
>
> So is the --magic output, maybe what is seen and being
> 'size-controlled' by
> auto-expire (was ~500K before recent test changes).
Yes, at least, it should be.
>
> Why isn't 'sa-learn --force expire' seeing the TOKENs indicated in
> sa-learn --dump magic?  
That is particularly strange to me, and it sounds like there's some
problems there.

Can you give a bit of detail, ie: what paths are you looking at for the
files, what version of SA,
> Debug messages are pointing at the same file
> for both operations, so how can dump-magic indicated 500K, but the
> debug of sa-learn --force-expire, is somehow seeing 0 TOKENs?
>
> Am I misinterpreting the debug output?
No, you don't seem to be.
>
>
>

Re: user-db size, content confusions (how many toks?)

Reply via email to