On Thu, Jun 06, 2002 at 10:19:07PM -0700, Matthew Cline wrote:
> On Thursday 06 June 2002 09:12 pm, Jeremy Zawodny wrote:
> > Does anyone with a large spam collection have good stats on the
> > average message size?
> 
> > I ask because I've been playing with the procmail rules that we use to
> > call spamc/spamd on wcnet.org a bit. I'm to the point now where I only
> > send messages to spamc that under 20KB in size.  And I'm still
> > trapping between 40,000 - 60,000 spams per day.
> 
> Well, in my spam corpus of 1366 messages, the average size is 8726 bytes.  
> 93.1% of them are 20KB or smaller, 95.3% are 25KB or smaller, and 98.0% are 
> 30KB or smaller (this is excluding viruses).  Of the 95 thaat are larger than 
> 20K, 89.5% are smaller than 50K.

Here's a quick run through sonic's corpus which is getting rather large. ;0

Total Spams = 297584
 < 1k : 1301 : 0.437187%
 < 2k : 18574 : 6.241599%
 < 3k : 47644 : 16.010269%
 < 4k : 73525 : 24.707310%
 < 5k : 45564 : 15.311307%
 < 6k : 31693 : 10.650102%
 < 7k : 21500 : 7.224851%
 < 8k : 15721 : 5.282878%
 < 9k : 10118 : 3.400048%
 < 10k : 5760 : 1.935588%
 < 11k : 4283 : 1.439257%
 < 12k : 3531 : 1.186556%
 < 13k : 3156 : 1.060541%
 < 14k : 2885 : 0.969474%
 < 15k : 1808 : 0.607560%
 < 16k : 1600 : 0.537663%
 < 17k : 1470 : 0.493978%
 < 18k : 2063 : 0.693250%
 < 19k : 1223 : 0.410976%
 < 20k : 719 : 0.241612%
 < 21k : 453 : 0.152226%
 < 22k : 442 : 0.148529%
 < 23k : 384 : 0.129039%
 < 24k : 192 : 0.064520%
 < 25k : 164 : 0.055110%
 < 26k : 214 : 0.071912%
 < 27k : 108 : 0.036292%
 < 28k : 301 : 0.101148%
 < 29k : 153 : 0.051414%
 < 30k : 111 : 0.037300%
 < 31k : 93 : 0.031252%
 < 32k : 118 : 0.039653%
 < 33k : 96 : 0.032260%
 < 34k : 97 : 0.032596%
 < 35k : 40 : 0.013442%
 < 36k : 23 : 0.007729%
 < 37k : 21 : 0.007057%
 < 38k : 13 : 0.004369%
 < 39k : 18 : 0.006049%
 < 40k : 36 : 0.012097%
 < 41k : 9 : 0.003024%
 < 42k : 60 : 0.020162%
 < 43k : 38 : 0.012770%
 < 44k : 24 : 0.008065%
 < 45k : 19 : 0.006385%
 < 46k : 2 : 0.000672%
 < 47k : 5 : 0.001680%
 < 48k : 6 : 0.002016%
 < 49k : 8 : 0.002688%
 < 50k : 1 : 0.000336%
 < 51k : 8 : 0.002688%
 < 52k : 2 : 0.000672%
 < 53k : 8 : 0.002688%
 < 54k : 10 : 0.003360%
 < 55k : 7 : 0.002352%
 < 56k : 29 : 0.009745%
 < 57k : 28 : 0.009409%
 < 58k : 16 : 0.005377%
 < 59k : 5 : 0.001680%
 < 60k : 2 : 0.000672%
 < 61k : 1 : 0.000336%
 < 62k : 3 : 0.001008%
 < 63k : 4 : 0.001344%
 < 68k : 3 : 0.001008%
 < 69k : 1 : 0.000336%
 < 72k : 1 : 0.000336%
 < 79k : 2 : 0.000672%
 < 80k : 1 : 0.000336%
 < 83k : 2 : 0.000672%
 < 84k : 4 : 0.001344%
 < 85k : 1 : 0.000336%
 < 87k : 2 : 0.000672%
 < 88k : 1 : 0.000336%
 < 89k : 4 : 0.001344%
 < 90k : 2 : 0.000672%
 < 94k : 1 : 0.000336%
 < 102k : 1 : 0.000336%

-- 
Kelsey Cummings - [EMAIL PROTECTED]         sonic.net
System Administrator                    2260 Apollo Way
707.522.1000 (Voice)                    Santa Rosa, CA 95407
707.547.2199 (Fax)                      http://www.sonic.net/
Fingerprint = 7F 59 43 1B 44 8A 0D 57  91 08 73 73 7A 48 90 C5

_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to