> Subjects being slightly different shouldn't be a problem because you can do > soundex or "like" searches when you have the data set.
good point. advanced comparisons like that would help a lot. > I was debating the reply-to and from but maybe it's best just ot use all of > them for now. Aww what the hell, parse up all the headers... it's only disk > space and CPU time. :-) I can always drop data that doesn't seem to be > helping later but i need most everything at the start to make some decent > analyses. yeah, I guess that was my reason for just having it include the full body of the message, too... at least until routines can be written that would extract relevant info like url's, phone numbers, etc. I should get a friend to help with this. he gets around 400 spams daily (3-letter domain that's been on the net for a LONG time)... -Chris _______________________________________________________________ Hundreds of nodes, one monster rendering program. Now that’s a super model! Visit http://clustering.foundries.sf.net/ _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk