Re: [spamassassin] Re: How to report 120,000 spams

Tuc at T-B-O-H.NET Sun, 09 Mar 2008 18:51:26 -0700

> 
> If you are proposing some kind of checksums or other types of 'message
> identifying' techniques on the messages,  those few mistyped addresses
> could certainly make a difference for your site.   What if bongo's mom
> mistypes to bungo, realizes her mistake and resends it to bongo a few
> minutes later.  It is quite likely that the valid message will be
> rejected now since it's (almost) identical to the one your proposed
> system just marked as spam.  What if bongo signs up for the a mailing
> list and mistypes his own email address (yes, this happens).  Now your
> system marks all list mailings as spam, so everyone using your system
> starts losing their copies of the mailing list messages too?
>
        Bango said that if his mom can't spell his name right, he doesn't
care if he gets her emails. :)


        I'm not proposing anything. I originally wanted to see if there
was some way that these 120,000 emails that don't go to a valid/usable
end user could be used to help the community out in some way. I had 2
filtering systems agree to do something with them, but for reasons I'd
rather not share neither one worked out. (One may still yet, I'm not
sure, waiting to hear back)

        We also don't do sitewide Bayes/etc. We do it per received user.
For this domain, it just happens that all 4 users of the domain
constitute a single received user. I realize that collectively this
list could propose well over 5000 reasons that make sense why "good" 
mail could be part of that 120,000. I just didn't think the ever so
insignificant percentage mattered. For as much as spam gets through,
and good mail gets marked bad also, I thought this was "acceptable".
>
> I think you have good intentions but the source of your data is flawed
> for anything but maybe limited statistical training.  Unfortunately it
> probably is not great for that either, since the mail you are seeing
> for non existent users is probably not at all similar to the mix of
> spam you get to real accounts.  The scanner would end up biased
> towards whatever junk the spammers desperate enough to use
> dictionaries send, which would drown out the stats from those spams
> that are actually difficult to detect.
>
        Ok, very valid point that makes alot of sense. Thank you.
> 
> Why do you accept messages for non existent accounts?  You're wasting
> bandwidth, regardless of what you do or don't do with the junk after
> you accept it.  From the sound of it you could reduce your mail
> bandwidth to a tiny fraction of what it is now by just refusing this
> stuff (which is what most everyone else does, AFAIK).
> 
        How do you do it on MX hosts? I realize that if I stop
the wildcard acceptance and stop copying errors to postmaster that
I can do it on the destination server. However, due to circumstances
out of my control for the next few months, all email arrives to the
main mail server via MXs ONLY.

                Thanks, Tuc

Re: [spamassassin] Re: How to report 120,000 spams

Reply via email to