Re: sa-learn: have i seen this before?

Faisal N Jawdat Sat, 21 Apr 2007 09:28:49 -0700

On Apr 21, 2007, at 11:23 AM, Matt Kettler wrote:

Ok, but how does knowing what SA learned it as help? It doesn't.


Figure out what to train as, and train.

it helps in that i can automatically iterate over some or all of mymail folders on a regular basis, selectively retraining *if*:


a) the message has already been trained
b) it's been trained the same way that i want it trained in the end
and

c) the cost of determining it's already been trained is substantiallylower than the cost of just training it

right now i do this manually: i have a "retrain as spam" folder anda "retrain as ham" folder and i hit them each every 5 minutes. i'drather get rid of the folders, which lets me then use the client-sidejunk mail systems to flag messages as spam or ham, which sa wouldthen pick up to retrain.

I never suggested that you should parse the headers. sa-learn doesthis

to extract the message-id and compare that to the bayes_seen database.
sa-learn *MUST* do this much to determine if the message has already
been learned. There's NO other way.

even so, it should be possible to parse the message, extract themessage-id, and compare the results in << 20 seconds.

That's a "separate sorter". sa-learn already does this internally,so *any* code on your part is a waste.

if sa-learn already does this internally then it's doing it ratherinefficiently. 20 seconds to pull a message id and compare itagainst the db (berkeleydb, fwiw)?


-faisal

Re: sa-learn: have i seen this before?

Reply via email to