LuKreme wrote: > On 10-May-2009, at 13:28, M<Galeti wrote: >> I started to check logs and saw 70%, 80% of emails >> coming in weekends are spam (in my case). > > But more than 70-80% of the emails coming in on any day of the week > are spam. > > 09-May-09: 85% > 08-May-09: 87% > 07-May-09: 82% > 06-May-09: 88% > 05-May-09: 86% > 04-May-09: 93% > 03-May-09: 92% > > Actual percentages are higher, this is just the spam that was rejected > during transaction. You can pretty safely add 3-5% to every number to > get an idea of the real spam totals. > Interesting. In my environment, the spam rate is more-or-less a constant rate, 24 hours a day. Occasionally the rate changes as new botnets rise and fall, but in general it's likely to remain at a nice steady "x" messages per hour.
Since our business is highly US-based, and not global, and most of our nonspam is business-to-business, our nonspam rates rise during work hours for the US (ie: from 9am eastern US, until 5pm pacific US, monday through friday), and drop off outside work hours. That rise and fall ends up changing the spam percentage, because while the nonspam email rate at 3am is very low, the spam rate is roughly the same. I think most companies which have a "regional" business base, and don't exchange email extensively with end consumers would find a similar pattern. However, this begs the question, is time-based scoring really worthwhile? We've discussed it many times on this list before, and much like geography-based systems, I don't think it's worth any significant scoring. The problem is, even though the spam percentage goes up at night, that does not mean that nonspam stops. It also does not mean that the nonspam messages sent during the night are any less important, or have any reason to be penalized. Generally speaking, good spam criteria are ones that differentiate spam messages from nonspam messages. Time and geography systems don't differentiate, they're merely creating an artificial grouping in which there's a lot of spam, and a little nonspam. That's great for establishing a correlation, but correlation is not causation. That's not to say that correlations aren't useful, but they're generally not worth "high" scores. They're generally best left with modest scores (ie: a quarter of your required_score threshold). Also, since this kind of rule would only work well for a particular kind of email base (localized business), and would work poorly for others (home users), I don't think it could ever be made a part of SpamAssassin proper.