>>>>> "DQ" == Daniel Quinlan <[EMAIL PROTECTED]> writes:
DQ> Vivek Khera <[EMAIL PROTECTED]> writes: >> I'm curious how you GA score the RBL hits. RBL's are by definition >> dynamic with IPs going in and out of the lists all the time. It >> seems to me the only reliable way to score it would be to see if the >> IP being tested was in the RBL at the time the message was >> originally received (or perhaps even a short while later), not at >> the time the GA test is run, perhaps many months after it was sent. DQ> I had the GA score for sets of messages with various ages (less than DQ> 6, 3, 2, and 1 months old). Generally, the score for each RBL DQ> improved as the age of the message reduced. To avoid overfitting Thanks for the explanation. I'm not sure how to interpret it all in terms of the effect on accuracy, but it is good to know that the age of the message is taken into consideration. Is such age-weighting done for the spam corpus as a whole? It seems to me that some older spam signatures are being phased out and may not be relevent for current spam... but then maybe I'm wrong about that. ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk