-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/23/2008 07:35 PM, Matt Kettler wrote: | Steven Stern wrote: |> We had a server go crazy last night and reset its date into August of |> 2277. In any case, we've resolved that, but now I can't get bayes to |> expire. |> |> After the clocks was correctly set, I deleted all tokens that had a |> lastupdate in the future, and also removed similar bayes_seen rows. I |> then reset the the token count in bayes_vars to the correct value. |> d |> When I try to run sa-learn --force-expire, nothing gets expired and |> the token list keeps growing. Will this get better on its own or do I |> need to intervene? | You might need to ditch your bayes database. | | The database will, over time, partially fix itself, but right now any | "one off" tokens learned while the date was off are stuck in your bayes | DB until 2277. SA's expiry method is based on the "age" of a token, | based on when it was last accessed. That method has absolutely no way to | deal with atimes that are in the future, so it will never try to expire | those tokens. | | It can partially fix itself, because every time a token gets accessed, | its atime gets updated. So as the more common tokens get used, they'll | start rotating out as they would normally. However, any unique tokens | are stuck there. | | If you're *really* desperate to preserve the bayes DB, you could wait a | couple days, do a sa-learn --backup, use grep to remove all the lines | with absurd atimes, then use sa-learn --restore. That's a good bit of | work to go through... | | If you decide to go this route: For reference, and assuming my | scratchpad math is right, the atimes for 2277 should be around 9.6 | billion, while the ones for 2008 should be around 1.2 billion. Of | course, that's assuming the atimes are stored 64 bit and aren't wrapping | as 32 bit numbers.. However, if that were the case, they'd be wrapping | to 2004, and your expire numbers should show really high token | eliminations, not really low.. |
It's finally started to remove tokens, so I think I'm OK. We use SQL bayes, so it was an easy matter to use ~ delete from bayes_token where atime > UNIX_TIMESTAMP(); to clean up the stuff from the future. - -- ~ Steve -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD8DBQFHmAwSeERILVgMyvARAmkBAJ4od1lX/wXYdadek1deySDYZi4SQgCfcskW dOHVuSkn5UeKZUGYJjA6J2A= =c5W9 -----END PGP SIGNATURE-----