On 10/02/2014 03:10 AM, Jason Haar wrote:
On 02/10/14 10:17, Axb wrote:
have you tried "-L forget" before "-L spam" ?
sa-learn --dump magic before and after learning show show a
difference...
I didn't do a "forget" before - I'll remember that, thanks. As far as
"before, after" goes for the dump - not an option. We're receiving 6-12
messages per second, "--dump magic" is *always* different :-)
However, it's been 7 hours since I sent my first email and now the same
message is BAYES_20 - so it is "learning" something - just took longer
than I was used to I guess. We use site-wide SA and don't really
hand-feed the bayes (too hard for our users: Exchange backends, SA
frontend), so there is over 200% more nspam tokens than nham - could
that cause a problem?
sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 3436572 0 non-token data: nspam
0.000 0 1475976 0 non-token data: nham
0.000 0 0 0 non-token data: ntokens
0.000 0 0 0 non-token data: oldest atime
0.000 0 0 0 non-token data: newest atime
0.000 0 0 0 non-token data: last journal
sync atime
0.000 0 0 0 non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire
atime delta
0.000 0 0 0 non-token data: last expire
reduction count
As I see it it's not a problem
In corporate traffic, ham patterns/tokens tend to be pretty constant
while spam patterns/tokens change way more often.
In my case. atm I have
0.000 0 28032453 0 non-token data: nspam
0.000 0 13119717 0 non-token data: nham
and have no BAYES_99 hitting ham.
On production boxes, SA sees very little spam (most gets rejected).
To compensate I feed spam from a separate trap box which autolearns
EVERYTHING it gets as spam (no rejects).
I also keep different token TTLs for spam and ham:
autolearn on production boxes has 7 days token TTL
autolearn on trap box has 5 days token TTL.
Redis memory usage is pretty constant
# Clients
connected_clients:99
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
# Memory
used_memory:4035596240
used_memory_human:3.76G
used_memory_rss:4403003392
used_memory_peak:4306083208
used_memory_peak_human:4.01G
used_memory_lua:109568
mem_fragmentation_ratio:1.09
mem_allocator:jemalloc-3.2.0
h2h
Axb