Hi, On Sun, 27 Jul 2003 15:53:40 -0700 John Rudd <[EMAIL PROTECTED]> wrote: > On Sunday, Jul 27, 2003, at 12:27 US/Pacific, Nix wrote: > > On Wed, 23 Jul 2003, Daniel Carrera stipulated: > >> On Thu, Jul 24, 2003 at 12:00:13AM +0100, Nix wrote: > >> > >>> Spam actually seems to differ quite a lot between individuals, > >> > >> Really? Why would that be the case? > > > > I think it depends which spammers' mailing lists you've landed up on. > > I think it's also a matter of one persons trash being another persons > treasure. Different people draw lines in different places. If you > define spam as UCE (forgetting that spam has more definitions than just > UCE), then how do you, via content filtering, identify which things > were solicited vs unsolicited? One person might want to have good > commercial messages identified as ham instead of allowing it to be > identified as spam but then white listed or filtered separately from > other spam, so their corpus will have stuff in their ham folder that > other people would call spam.
There's a more pragmatic reason for not training Bayes on someone else's corpus; Bayes will most likely learn 'mail addressed to <original victim> is spam.' Since Bayes learns from both message header and body, it's fairly important that the ham and spam it's trained on were originally directed at you, not some random third party. I get about 20 bits of spam a day and much more ham than that in mailing list and personal traffic; I can wait 10 days to collect enough spam to train SA (NB: 251 spams since 7/15.) If it takes you more than a week or two to collect enough spam and ham to train Bayes, you don't have much of a spam problem ;) OT: I prefer the definition of spam=UBE. I don't care what the content is; 20 blank messages a day is just as abusive to the network as a whole as 20 ads for weenis-enlargement pillz, hot teen mortgages refilling my barnyard toner cartridges with Katmandu Temple Kiff, or URGENT BUSINESS PROPOSALS. Damage done to individual inboxes is minor compared to the damage caused by flooding networks with noise, delaying legitimate traffic, increasing demands on bandwidth and hardware, and soaking up valuable admin hours. Content is irrelevant; spamming is aberrant behavior. -- Bob ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk