On 10/02/2014 03:10 AM, Jason Haar wrote:
On 02/10/14 10:17, Axb wrote:

have you tried "-L forget" before "-L spam" ?

sa-learn --dump magic  before and after learning show show a
difference...

I didn't do a "forget" before - I'll remember that, thanks. As far as
"before, after" goes for the dump - not an option. We're receiving 6-12
messages per second, "--dump magic" is *always* different :-)

However, it's been 7 hours since I sent my first email and now the same
message is BAYES_20 - so it is "learning" something - just took longer
than I was used to I guess. We use site-wide SA and don't really
hand-feed the bayes (too hard for our users: Exchange backends, SA
frontend), so there is over 200% more nspam tokens than nham - could
that cause a problem?


  sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0    3436572          0  non-token data: nspam
0.000          0    1475976          0  non-token data: nham
0.000          0          0          0  non-token data: ntokens
0.000          0          0          0  non-token data: oldest atime
0.000          0          0          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0          0          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire
atime delta
0.000          0          0          0  non-token data: last expire
reduction count

As I see it it's not a problem
In corporate traffic, ham patterns/tokens tend to be pretty constant while spam patterns/tokens change way more often.

In my case. atm I have

0.000          0   28032453          0  non-token data: nspam
0.000          0   13119717          0  non-token data: nham

and have no BAYES_99 hitting ham.

On production boxes, SA sees very little spam (most gets rejected).
To compensate I feed spam from a separate trap box which autolearns EVERYTHING it gets as spam (no rejects).

I also keep different token TTLs  for spam and ham:
autolearn on production boxes has 7 days token TTL
autolearn on trap box has 5 days token TTL.

Redis memory usage is pretty constant

# Clients
connected_clients:99
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

# Memory
used_memory:4035596240
used_memory_human:3.76G
used_memory_rss:4403003392
used_memory_peak:4306083208
used_memory_peak_human:4.01G
used_memory_lua:109568
mem_fragmentation_ratio:1.09
mem_allocator:jemalloc-3.2.0

h2h

Axb

Reply via email to