"Mike Loiterman" <[EMAIL PROTECTED]> writes: > I understand now. Well, is there anyway to fix it so that the graph is > actually useful when you reach those kind of numbers? Mine works great and > is very useful, but I probably have a 1/100 of the volume you do. BTW, is > that your personal mail server? If so, you get a LOT of mail!
Yeah, that's my personal mail. That's what happens when your a consultant for 13 years, run two fairly popular web sites and refuse to disable circa '92 email addresses because, you never know, someone you have lost track of might have a gig for you. You see why SA was so appealing to me! (in case I haven't been clear enough, the reason is that I'm a packrat) Fix it? Hmmm, that's a funny question. I still find it useful, but clearly it's a matter of what question you are trying to settle. http://wellner.org/email-scores.png For example, if you look under the non-spam line, you can still see a regular heartbeat on weekends when I get less mail. That might be useful for some advanced scoring tool. I do appreciate that it's getting pretty green though. And that makes subtler features harder to distinguish. I know several people use my scripts, so if anyone has been producing other derivations from the list of mail scores I'd sure be interested in hearing about it. I'll bundle this stuff together and post it if people contribute. I was writing this note and put in that a histogram might be useful and that a couple people had already asked, so I decided to put one together instead of continuing to talk about it. You can see a sample from my mail: http://wellner.org/histplot.png --histogram.py (reads the output from my previous procmail script #!/usr/bin/env python import string f=open("email-scores") line=f.readline() #in principal you should only have to change the next two lines #to change the graph the way most people would want #the rest of the script derives the correct parameters from these. buckets=[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] range=[-20,55] bucketSize=(float)(range[1] - range[0]) / len(buckets) shiftedBottom = abs(range[0]) while line: try: fields = string.split(line) if float(fields[2]) < range[0]: buckets[0] = buckets[0]+1 elif float(fields[2]) > range[1]: buckets[len(buckets)-1] = buckets[len(buckets)-1] + 1 else: bucket = (float(fields[2])+shiftedBottom) / bucketSize buckets[int(bucket)] = buckets[int(bucket)] + 1 line = f.readline() except: # print line line=f.readline() i=0 for bucketsize in buckets: bucketbottom = i*bucketSize + range[0] print bucketbottom, bucketsize i=i+1 --sample output -20.0 185 -17.0 0 -14.0 0 -11.0 214 -8.0 232 -5.0 8396 -2.0 16683 1.0 13682 4.0 6521 7.0 4457 10.0 4980 13.0 4940 16.0 4432 19.0 3247 22.0 2058 25.0 1299 28.0 878 31.0 512 34.0 301 37.0 200 40.0 59 43.0 21 46.0 3 49.0 0 52.0 3 --histplot gnuplot file set timefmt "%m/%d/%y %H:%M" set title "Distribution of spam values" set xlabel "Hit Value" set ylabel "Number of messages" set nolabel set terminal png color #set yrange [-10:50] plot '/home/wellner/hist' using 1:2 with boxes rw2 ------------------------------------------------------- This sf.net email is sponsored by: Influence the future of Java(TM) technology. Join the Java Community Process(SM) (JCP(SM)) program now. http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0003en _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk