Hi,

On Sun, 27 Jul 2003 15:53:40 -0700 John Rudd <[EMAIL PROTECTED]> wrote:
> On Sunday, Jul 27, 2003, at 12:27 US/Pacific, Nix wrote:
> > On Wed, 23 Jul 2003, Daniel Carrera stipulated:
> >> On Thu, Jul 24, 2003 at 12:00:13AM +0100, Nix wrote:
> >>
> >>> Spam actually seems to differ quite a lot between individuals,
> >>
> >> Really?  Why would that be the case?
> >
> > I think it depends which spammers' mailing lists you've landed up on.
> 
> I think it's also a matter of one persons trash being another persons 
> treasure.  Different people draw lines in different places.  If you 
> define spam as UCE (forgetting that spam has more definitions than just 
> UCE), then how do you, via content filtering, identify which things 
> were solicited vs unsolicited?  One person might want to have good 
> commercial messages identified as ham instead of allowing it to be 
> identified as spam but then white listed or filtered separately from 
> other spam, so their corpus will have stuff in their ham folder that 
> other people would call spam.

There's a more pragmatic reason for not training Bayes on someone else's
corpus; Bayes will most likely learn 'mail addressed to <original
victim> is spam.' Since Bayes learns from both message header and body,
it's fairly important that the ham and spam it's trained on were
originally directed at you, not some random third party. I get about 20
bits of spam a day and much more ham than that in mailing list and
personal traffic; I can wait 10 days to collect enough spam to train SA
(NB: 251 spams since 7/15.)

If it takes you more than a week or two to collect enough spam and ham
to train Bayes, you don't have much of a spam problem ;)

OT: I prefer the definition of spam=UBE. I don't care what the content
is; 20 blank messages a day is just as abusive to the network as a whole
as 20 ads for weenis-enlargement pillz, hot teen mortgages refilling my
barnyard toner cartridges with Katmandu Temple Kiff, or URGENT BUSINESS
PROPOSALS. Damage done to individual inboxes is minor compared to the
damage caused by flooding networks with noise, delaying legitimate
traffic, increasing demands on bandwidth and hardware, and soaking up
valuable admin hours. Content is irrelevant; spamming is aberrant
behavior.

-- Bob


-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to