Re: Mondo bayes_toks - millions of entries

2007-12-06 Thread Wes
ch to all-manual learning and hopefully convince enough users to send in spam and false positives to train it well. Sufficient participation is a big question, but appears to be the only viable option at this point. Wes

Re: Mondo bayes_toks - millions of entries

2007-12-03 Thread Wes
ing failed expires due to 'deadlock detected'. Regrouping, I was looking at benchmarks for QDBM and see it is on the "we need volunteers" list. Is this more than just changing the "tie" in the Bayes DBM store module? Wes

Re: Mondo bayes_toks - millions of entries

2007-11-30 Thread Wes
0-80% CPU constantly. Wes

Re: Mondo bayes_toks - millions of entries

2007-11-30 Thread Wes
as dropped dramatically. With 163,000 loaded, it is down to 100/second. I decided to start with a clean DB and let auto-learn repopulate it. Wes

Re: Mondo bayes_toks - millions of entries

2007-11-30 Thread Wes
shows up every couple of days. I guess the flip side is that if a message is manually learned, and then you continue to get messages in like that (at least more than the turnover frequency), then the manually-learned information should stay active. Correct? Wes

Re: Mondo bayes_toks - millions of entries

2007-11-30 Thread Wes
ks can be avoided by sorting the keys to be updated so that they are always updated in the same order (and/or retrying should a deadlock be detected). Wes

Re: Mondo bayes_toks - millions of entries

2007-11-30 Thread Wes
't reasonable, though. I can't see (at least here) that manual learning would get any kind of significant volume. Someone's only going to send in a message for manual learning if it is a leaked spam or a false positive, and then only if they bother to do it. I'd be surprised if the manual learning volume was 1 in 10,000 of the messages going through the auto-learning. Wes

Re: Mondo bayes_toks - millions of entries

2007-11-29 Thread Wes
epends on what the update vs. read load is. I would think it would be extremely useful to be able to treat manually-learned rules separately from auto-learned rules. In a high volume environment, you'd want to keep manually learned rules far longer than you could possibly keep auto-learned ones. Manually learned rules should be more important. Wes

Re: Mondo bayes_toks - millions of entries

2007-11-29 Thread Wes
es it handle concurrency, if it has to update the last access time of tokens and learn new tokens? Are there any numbers on concurrent servers when it starts to bog down? Wes

Re: Mondo bayes_toks - millions of entries

2007-11-29 Thread Wes
sn't large enough > it is going to churn so fast that it'll defeat the purpose of even > having a bayes database. I had pretty much come to that conclusion, but all the posts I found were talking about token databases in the low hundreds of thousands, and I've been seeing millions... Wasn't sure I wasn't overlooking something big. Wes

Re: Mondo bayes_toks - millions of entries

2007-11-29 Thread Wes
-learning. If this is true, then that tells me that with our volume, we either have to do all automatic learning, or all manual learning. With both enabled, any manual learning would likely be lost within less than a day. Ugh. Wes

Re: Mondo bayes_toks - millions of entries

2007-11-29 Thread Wes
ow unlocking lock [21506] dbg: locker: safe_unlock: unlocked /home/smfs/.spamassassin/bayes.mutex [21506] dbg: bayes: expiry completed bayes: synced databases from journal in 0 seconds: 927 unique entries (927 total entries) expired old bayes database entries in 432 seconds 3702653 entries kept, 1230354 deleted token frequency: 1-occurrence tokens: 83.22% token frequency: less than 8 occurrences: 12.56% Wes

Mondo bayes_toks - millions of entries

2007-11-28 Thread Wes
ion, won't they also be subject to the (short) expiration period, or is manual learning kept permanently? Thanks Wes