Jeremy Zawodny wrote:

JZ> Does anyone with a large spam collection have good stats on the
JZ> average message size?

Size        Count
----        -----
0-1k            89
1k-10k       57170
10k-50k      17062
50k-100k       530
100k-200k      282
200k-500k      128
500k-10M        47


That's a big chunk of the corpus.  Another way to slice it:

percentile       90%      99% 99.81409%     99.9%     99.99%    99.999%
length      17514.30 59951.89    250000 354266.97 1338195.20 2298653.16

C


_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to