On Thu, Jun 06, 2002 at 10:19:07PM -0700, Matthew Cline wrote: > On Thursday 06 June 2002 09:12 pm, Jeremy Zawodny wrote: > > Does anyone with a large spam collection have good stats on the > > average message size? > > > I ask because I've been playing with the procmail rules that we use to > > call spamc/spamd on wcnet.org a bit. I'm to the point now where I only > > send messages to spamc that under 20KB in size. And I'm still > > trapping between 40,000 - 60,000 spams per day. > > Well, in my spam corpus of 1366 messages, the average size is 8726 bytes. > 93.1% of them are 20KB or smaller, 95.3% are 25KB or smaller, and 98.0% are > 30KB or smaller (this is excluding viruses). Of the 95 thaat are larger than > 20K, 89.5% are smaller than 50K.
Here's a quick run through sonic's corpus which is getting rather large. ;0 Total Spams = 297584 < 1k : 1301 : 0.437187% < 2k : 18574 : 6.241599% < 3k : 47644 : 16.010269% < 4k : 73525 : 24.707310% < 5k : 45564 : 15.311307% < 6k : 31693 : 10.650102% < 7k : 21500 : 7.224851% < 8k : 15721 : 5.282878% < 9k : 10118 : 3.400048% < 10k : 5760 : 1.935588% < 11k : 4283 : 1.439257% < 12k : 3531 : 1.186556% < 13k : 3156 : 1.060541% < 14k : 2885 : 0.969474% < 15k : 1808 : 0.607560% < 16k : 1600 : 0.537663% < 17k : 1470 : 0.493978% < 18k : 2063 : 0.693250% < 19k : 1223 : 0.410976% < 20k : 719 : 0.241612% < 21k : 453 : 0.152226% < 22k : 442 : 0.148529% < 23k : 384 : 0.129039% < 24k : 192 : 0.064520% < 25k : 164 : 0.055110% < 26k : 214 : 0.071912% < 27k : 108 : 0.036292% < 28k : 301 : 0.101148% < 29k : 153 : 0.051414% < 30k : 111 : 0.037300% < 31k : 93 : 0.031252% < 32k : 118 : 0.039653% < 33k : 96 : 0.032260% < 34k : 97 : 0.032596% < 35k : 40 : 0.013442% < 36k : 23 : 0.007729% < 37k : 21 : 0.007057% < 38k : 13 : 0.004369% < 39k : 18 : 0.006049% < 40k : 36 : 0.012097% < 41k : 9 : 0.003024% < 42k : 60 : 0.020162% < 43k : 38 : 0.012770% < 44k : 24 : 0.008065% < 45k : 19 : 0.006385% < 46k : 2 : 0.000672% < 47k : 5 : 0.001680% < 48k : 6 : 0.002016% < 49k : 8 : 0.002688% < 50k : 1 : 0.000336% < 51k : 8 : 0.002688% < 52k : 2 : 0.000672% < 53k : 8 : 0.002688% < 54k : 10 : 0.003360% < 55k : 7 : 0.002352% < 56k : 29 : 0.009745% < 57k : 28 : 0.009409% < 58k : 16 : 0.005377% < 59k : 5 : 0.001680% < 60k : 2 : 0.000672% < 61k : 1 : 0.000336% < 62k : 3 : 0.001008% < 63k : 4 : 0.001344% < 68k : 3 : 0.001008% < 69k : 1 : 0.000336% < 72k : 1 : 0.000336% < 79k : 2 : 0.000672% < 80k : 1 : 0.000336% < 83k : 2 : 0.000672% < 84k : 4 : 0.001344% < 85k : 1 : 0.000336% < 87k : 2 : 0.000672% < 88k : 1 : 0.000336% < 89k : 4 : 0.001344% < 90k : 2 : 0.000672% < 94k : 1 : 0.000336% < 102k : 1 : 0.000336% -- Kelsey Cummings - [EMAIL PROTECTED] sonic.net System Administrator 2260 Apollo Way 707.522.1000 (Voice) Santa Rosa, CA 95407 707.547.2199 (Fax) http://www.sonic.net/ Fingerprint = 7F 59 43 1B 44 8A 0D 57 91 08 73 73 7A 48 90 C5 _______________________________________________________________ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk