"Mike Loiterman" <[EMAIL PROTECTED]> writes:

> I understand now.  Well, is there anyway to fix it so that the graph is
> actually useful when you reach those kind of numbers?  Mine works great and
> is very useful, but I probably have a 1/100 of the volume you do.  BTW, is
> that your personal mail server?  If so, you get a LOT of mail!

Yeah, that's my personal mail.  That's what happens when your a consultant for
13 years, run two fairly popular web sites and refuse to disable circa '92
email addresses because, you never know, someone you have lost track of might
have a gig for you.  You see why SA was so appealing to me!  (in case I
haven't been clear enough, the reason is that I'm a packrat)

Fix it?  Hmmm, that's a funny question.  I still find it useful, but clearly
it's a matter of what question you are trying to settle.

http://wellner.org/email-scores.png

For example, if you look under the non-spam line, you can still see a regular
heartbeat on weekends when I get less mail.  That might be useful for some
advanced scoring tool.

I do appreciate that it's getting pretty green though.  And that makes subtler
features harder to distinguish.

I know several people use my scripts, so if anyone has been producing other
derivations from the list of mail scores I'd sure be interested in hearing
about it.  I'll bundle this stuff together and post it if people contribute.

I was writing this note and put in that a histogram might be useful and that a
couple people had already asked, so I decided to put one together instead of
continuing to talk about it.

You can see a sample from my mail: http://wellner.org/histplot.png

--histogram.py  (reads the output from my previous procmail script
#!/usr/bin/env python
import string

f=open("email-scores")
line=f.readline()

#in principal you should only have to change the next two lines
#to change the graph the way most people would want
#the rest of the script derives the correct parameters from these.
buckets=[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
range=[-20,55]
bucketSize=(float)(range[1] - range[0]) / len(buckets)
shiftedBottom = abs(range[0])
while line:
    try:
        fields = string.split(line)
        if float(fields[2]) < range[0]:
            buckets[0] = buckets[0]+1
        elif float(fields[2]) > range[1]:
            buckets[len(buckets)-1] = buckets[len(buckets)-1] + 1
        else:
            bucket = (float(fields[2])+shiftedBottom) / bucketSize

        buckets[int(bucket)] = buckets[int(bucket)] + 1
        line = f.readline()

    except:
        # print line
        line=f.readline()

i=0
for bucketsize in buckets:
    bucketbottom = i*bucketSize + range[0]
    print bucketbottom, bucketsize
    i=i+1



--sample output
-20.0 185
-17.0 0
-14.0 0
-11.0 214
-8.0 232
-5.0 8396
-2.0 16683
1.0 13682
4.0 6521
7.0 4457
10.0 4980
13.0 4940
16.0 4432
19.0 3247
22.0 2058
25.0 1299
28.0 878
31.0 512
34.0 301
37.0 200
40.0 59
43.0 21
46.0 3
49.0 0
52.0 3

--histplot  gnuplot file
set timefmt "%m/%d/%y %H:%M"
set title "Distribution of spam values"
set xlabel "Hit Value"
set ylabel "Number of messages"
set nolabel
set terminal png color
#set yrange [-10:50]
plot '/home/wellner/hist' using 1:2 with boxes


rw2


-------------------------------------------------------
This sf.net email is sponsored by: Influence the future 
of Java(TM) technology. Join the Java Community 
Process(SM) (JCP(SM)) program now. 
http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0003en
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to