Tuc at T-B-O-H.NET wrote:
>
>>
>> On 08.03.08 18:28, Tuc at T-B-O-H wrote:
>> > > Our mail server receives about 128K emails a day. Of
>> > > those, 120K are absolutely known spam so I don't even run
>> > > them through spamassassin. Of the 8K left, 6K are determined
>> > > to be spams, and 2K are considered "good".
>> > >
>> > > I'm wondering if there is some way to help the
>> > > community (and, admittedly, ourselves) to somehow process
>> > > and report those spams to various databases. For the
>> > > smaller users, I've implemented the SiteWideRazor and
>> > > use procmail to save off their spams to "probably-spam"
>> > > and process them through "spamassassin -r" once an hour.
>> > >
>> > > For our bigger ones, though, so as not to wear
>> > > a hole in the disk drive, I wondered if there were any
>> > > suggestions what to do.
>>
>> > Anyone??
>>
>> afaik razor requires manual reporting, not anything automatic. Also note
>> that some people tend to mark as "spam" anything they don't like, even
>> mailing lists they have subscribed to (but are unable to unsubscribe -
>> this
>> if very common form of dumbness)
>>
>> You can run DCC server which does something similar but is completely
>> automated.
>>
> Hi,
>
> Thanks for the reply.
>
> I have a feeling that I'm not explaining myself well enough given
> this and private replies I've received.
>
> I am mail hosting for a domain, we'll call it example.com . There
> are, and have only been 4 VALID email addresses for example.com such as :
>
> [EMAIL PROTECTED]
> [EMAIL PROTECTED]
> [EMAIL PROTECTED]
> [EMAIL PROTECTED]
>
> Those come in, get scanned by SA, and the ones we think are good
> enough we pass along to the owners email address on his local ISP
> (Hughes.net,
> who has their email processed by Tucows's securehostedemail.com that
> violates
> RFC's and causes sendmail to pump out kernel based messages which I can't
> get
> anyone there to listen to!).
>
> In the mean time, anything that isn't going to bingo, bango, bongo
> or irving is sent straight to /dev/null from the MTA. Its these messages
> that
> go straight to /dev/null that I'd like to somehow get processed into
> something
> useful for the community. Its not the result of a user getting an email
> from
> examplemacys.com, and saying "Well, I did subscribe, but I have no need
> for
> their shoe sale this week, I call "SPAM!!!!" ". These are messages to
> email
> addresses at example.com that were NEVER legit email addresses.
>
> As part of it all, I also want to try to keep disk usage and CPU
> down to as little as possible. With 120,000 per day, thats a junk mail
> every 3/4's of a second. Since I have it set to deliver to /dev/null, I
> reduce the amount of disk usage. I'm looking for a solution that would be
> easy on the disk and easy on the CPU. So something directly out of the
> MTA
> would be great (sendmail) or something that the delivery would not store
> it locally.
>
> I'm concerned if I set up another user, who has a .procmailrc to
> send it directly to "spamassassin -r" that it start spawning off way too
> many processes, too many perl invocations, etc. Same for piping to
> razor-report (And it only benefits razor, no one else).
>
> I thought DCC was running on this system, but it appears not. I'll
> have to check why and get it running. I thought it was just another
> database
> for SA to check, I'll have to read more about it. Thanks.
>
> Tuc
>
> Thanks, Tuc
>
>
>
Wow! You receive a LOT of spam. I manage a site which, for today (so far -
there's an hour left), we have blocked 139,980 spam emails !!! And, this is
down from what we used to get.
The problems you are dealing with - disk space, resource usage, etc. is why
we finally resorted to writing
a spam blocker (in C - no perl) that blocks the spam at the SMTP protocol
level (there is another
topic titled "Yet another spam blocker" which discusses this) and never lets
the messages make it to
the disk at all. There are also other advantages to blocking at protocol.
Automatic reporting - that's another thing entirely. As was pointed out in
previous replys, the user
community is not always accurate in reporting what is legit spam, and what
is/was requested
or "permitted". I tend to report manually, although I am writing some code
to semi-automate the
process. The program picks out domains, TLDs in URLs and IP addresses (in
spam), puts them in edit
windows, and then allows me to view the message. At this point, I can click
a button to report the
offending hosts/ips/etc. or not. But, it is semi-manual and therefore
involves time. The tradeoff is
accurate reporting to the various block lists.
I wish I had a better answer for you!
Regards,
Steve
--
View this message in context:
http://www.nabble.com/How-to-report-120%2C000-spams-a-day-tp15857111p15923807.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.