Tuc at T-B-O-H wrote:
Tuc at T-B-O-H.NET wrote:
        I guess I'm still not being clear. There are 120K emails a day coming
to INVALID EMAIL ADDRESSES THAT NEVER EXISTED. Its not a case of a user being
fickle, its a case that they are emailing addresses that NEVER EVER ACTUALLY
EXISTED. About 1 ever 3/4 of a second. So running them through ANYTHING is
counter productive since , atleast in my eyes, if you try to email an email
address that never existed... ITS SPAM. Its not things the user ever sees/knows,
etc. I have in my sendmail virtusertable:

[EMAIL PROTECTED]                       bingo
[EMAIL PROTECTED]                       bango
[EMAIL PROTECTED]                       bongo
[EMAIL PROTECTED]                       irving
[EMAIL PROTECTED]                               nobody

        The user doesn't even SEE the emails, and processing what they consider
spam I really don't care about. But getting 120K emails to *@ that are 
absolutely
known spam... I would like to help the community out by reporting them to every
system possible. Yea, if the added benefit is the mail that bingo, bango, bongo
and irving gets filtered a little better... I won't complain at all.

                        Tuc
Just because mail goes to invalid addresses does not mean it is spam. people do mistype addresses some time. so this "corpus" is not safe.

        Yes, I realize people mistype email addresses. But the domain gets
121,000 emails on an average day.

        Of those 121,000 emails a day, 120,000 are to email addresses that
aren't of the 4 known/valid/acceptable ones. What percentage would you like
to use of emails that are sent are mistyped. One out of 1000? That means
121 invalid email addresses a day? But the other 999 of 1000 aren't valid...

        Of the other 1000 that ARE to the 4 known/valid/acceptable email
addresses, about 900 of them are marked by SA as a spam level over 5.
Usually WILDLY over 5, like 20's and 30's. Of those 100 delivered, 75 of them are rejected by the spam filter (Using a method that violates the standard RFC's according to
sendmail) of the "final destination" for all 4 of those email boxes (Yes,
bingo, bango, bongo, irving actually all end up forwarded to [EMAIL PROTECTED]).
        Of the 25 that make it through, the user tells me 15 of them are
usually spam.

        So, 10 VALID/ACCEPTABLE emails a day out of 121,000 emails received
a day .. Or 8 THOUSANDS OF A SINGLE PERCENT.
        So, while I definitely don't think people can type bingo, bango,
bongo, irving correctly 100% of the time, with a valid email ratio of 8
thousands of a percent, I don't think in the grand scheme of things that mistyped email addresses really account for much/any.

The same argument applies to mail to valid addresses (bingo, bango, ...) as well. would you like to use all your mail as a spam corpus? after all, you get only 10 out of 120000 messages to these addresses :)

anyway, you'll have to make your mind. N spam messages is not the same thing as N probable spam messages, even if the probablity is 0.99999999999999999999999999999999999999999999999 (with a finite number of 9s). if the probability is not 1 (exactly), then the corpus is polluted. It may be statistically good, but that's not always good.

The worst part of this story is that you may be silently (and "frivoulously") discarding legitimate mail, which is not very nice (if I mistype an address in the said domain, my mail gets dropped and I don't have a chance to fix my typo...). Do yourself and others a favour and find a way to reject these at smtp time. if you want to trap some spam, use carefully selected addresses.


Reply via email to