-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 01/23/2008 07:35 PM, Matt Kettler wrote:
| Steven Stern wrote:
|> We had a server go crazy last night and reset its date into August of
|> 2277.  In any case, we've resolved that, but now I can't get bayes to
|> expire.
|>
|> After the clocks was correctly set, I deleted all tokens that had a
|> lastupdate in the future, and also removed similar bayes_seen rows.  I
|> then reset the the token count in bayes_vars to the correct value.
|> d
|> When I try to run sa-learn --force-expire, nothing gets expired and
|> the token list keeps growing.  Will this get better on its own or do I
|> need to intervene?
| You might need to ditch your bayes database.
|
| The database will, over time, partially fix itself, but right now any
| "one off" tokens learned while the date was off are stuck in your bayes
| DB until 2277. SA's expiry method is based on the "age" of a token,
| based on when it was last accessed. That method has absolutely no way to
| deal with atimes that are in the future, so it will never try to expire
| those tokens.
|
| It can partially fix itself, because every time a token gets accessed,
| its atime gets updated. So as the more common tokens get used, they'll
| start rotating out as they would normally. However, any unique tokens
| are stuck there.
|
| If you're *really* desperate to preserve the bayes DB, you could wait a
| couple days, do a sa-learn --backup, use grep to remove all the lines
| with absurd atimes, then use sa-learn --restore. That's a good bit of
| work to go through...
|
| If you decide to go this route:  For reference, and assuming my
| scratchpad math is right, the atimes for 2277 should be around 9.6
| billion, while the ones for 2008 should be around 1.2 billion. Of
| course, that's assuming the atimes are stored 64 bit and aren't wrapping
| as 32 bit numbers.. However, if that were the case, they'd be wrapping
| to 2004, and your expire numbers should show really high token
| eliminations, not really low..
|

It's finally started to remove tokens, so I think I'm OK. We use SQL
bayes, so it was an easy matter to use

~  delete from bayes_token where atime > UNIX_TIMESTAMP();

to clean up the stuff from the future.


- --

~  Steve
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHmAwSeERILVgMyvARAmkBAJ4od1lX/wXYdadek1deySDYZi4SQgCfcskW
dOHVuSkn5UeKZUGYJjA6J2A=
=c5W9
-----END PGP SIGNATURE-----

Reply via email to