Andy Jezierski writes:
> Are there any instructions in setting up the Bayes DB using a Redis 
> server?

Yes, in release notes (currently also in build/announcements/PROPOSED-3.4.0.txt
in svn). Pretty much exactly as you already have it.
 
> I've installed the server, took the sample config options and added them 
> to local.cf
> 
> bayes_store_module  Mail::SpamAssassin::BayesStore::Redis
> bayes_store_module_additional Mail::SpamAssassin::Util::TinyRedis
> bayes_sql_dsn       server=127.0.0.1:6379;password=spamd;database=2
> bayes_token_ttl 21d
> bayes_seen_ttl   8d
> bayes_auto_expire 1
> use_bayes               1
> bayes_auto_learn        1
> 
> Performed a redis-cli -n 2 FLUSHDB
> 
> Did a backup of one of my mysql bayes databases and am attempting to do a 
> restore to the new system.

Good.

> Looks like the redis server keeps chewing up swap space until it runs out, 
> then the redis server terminates.
> 
> Running on FreeBSD 9.2   perl 5.18-5.18.2   redis server 2.8.4
> Any ideas?

Depends very much on the number of tokens you have in you SQL database.

Mine (cca 1000 users) keeps hovering at about 1 M tokens (and just keeps
few very recent 'seen' entries), resulting in redis server using under
300 MB of memory.

$ redis-cli -n 2 keys 'w:*' | wc -l
 1091475

$ redis-cli -n 2 keys 's:*' | wc -l
    1324

May be worthwhile to purge old tokens from SQL first, before
creating a backup.  Also, it is safe to ditch the entire 'seen'
set of records, it's not worth transfering them to a new database.

If this still gives unreasonable number of tokens, it may be worth
decimating a set - just preserving a random subset of tokens.

Another option is to just start from an empty database. With a reasonable
set of other rules, network tests and autolearning on, the required
200 samples of ham and 200 of spam can be quickly reached on
a busy server. During initial learning consider decreasing score for
BAYES_00 and BAYES_99 rules.

Note that bayes_token_ttl and bayes_seen_ttl have no effect
on entries loaded from a backup dump, they are all given
a 'current' timestamp (with some random offset so that they
will not expire at exactly the same time).  But for a steady-state,
with these *_ttl settings you can control how many items are
kept in a database on the average.


Axb writes:
> what does sa-learn --dump magic say (when using mysql)

Good idea to check this first.

> my Redis
> PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>   5728 root      20   0 5355m 5.1g 1020 S  1.3 37.1 711:19.45 redis-server
> 
> sa-learn --dump magic
> 0.000          0          3          0  non-token data: bayes db version
> 0.000          0   16481050          0  non-token data: nspam
> 0.000          0    5690858          0  non-token data: nham

> bayes_token_ttl       864000
> bayes_seen_ttl  2d

A biggie!

Btw, with redis db the number of tokens actually in a database
may not be directly related to the number of learned and reported
tokens bacause of the automatic expiration performed by
redis server (according to bayes_token_ttl) - unlike other bayes
back-ends where purging is done explicitly by SpamAssassin.

  Mark

Reply via email to