LuKreme wrote:
> On 10-May-2009, at 13:28, M<Galeti wrote:
>> I started to check logs and saw 70%, 80% of emails
>> coming in weekends are spam (in my case).
>
> But more than 70-80% of the emails coming in on any day of the week
> are spam.
>
> 09-May-09: 85%
> 08-May-09: 87%
> 07-May-09: 82%
> 06-May-09: 88%
> 05-May-09: 86%
> 04-May-09: 93%
> 03-May-09: 92%
>
> Actual percentages are higher, this is just the spam that was rejected
> during transaction.  You can pretty safely add 3-5% to every number to
> get an idea of the real spam totals.
>
Interesting. In my environment, the spam rate is more-or-less a constant
rate, 24 hours a day. Occasionally the rate changes as new botnets rise
and fall, but in general it's likely to remain at a nice steady "x"
messages per hour.

Since our business is highly US-based, and not global, and most of our
nonspam is business-to-business, our nonspam rates rise during work
hours for the US (ie: from 9am eastern US, until 5pm pacific US, monday
through friday), and drop off outside work hours.

That rise and fall ends up changing the spam percentage, because while
the nonspam email rate at 3am is very low, the spam rate is roughly the
same.

I think most companies which have a "regional" business base, and don't
exchange email extensively with end consumers would find a similar pattern.

However, this begs the question, is time-based scoring really
worthwhile? We've discussed it many times on this list before, and much
like geography-based systems, I don't think it's worth any significant
scoring.

The problem is, even though the spam percentage goes up at night, that
does not mean that nonspam stops. It also does not mean that the nonspam
messages sent during the night are any less important, or have any
reason to be penalized.

Generally speaking, good spam criteria are ones that differentiate spam
messages from nonspam messages. Time and geography systems don't
differentiate, they're merely creating an artificial grouping in which
there's a lot of spam, and a little nonspam. That's great for
establishing a correlation, but correlation is not causation.

That's not to say that correlations aren't useful, but they're generally
not worth "high" scores. They're generally best left with modest scores
(ie: a quarter of your required_score threshold).

Also, since this kind of rule would only work well for a particular kind
of email base (localized business), and would work poorly for others
(home users), I don't think it could ever be made a part of SpamAssassin
proper.





Reply via email to