At 02:52 AM 12/13/03 +0100, Kai Poppe wrote:
Ho list !

I just did a nice .cf that deactivates SA's old deka-step-html-percentage
tests and does a hundred tests ranging from 0% to 100% (naturally *g*).
Hope you find it useful, comments appreciated !

http://www.poppe-online.de/spamassassin/55_html_perc_tests.cf

As a second commentary, why are the scores you assigned linearly increasing with % HTML? Is there a reason, or just something you whipped up quickly?


The scores assigned by the GA, and the S/O's in statistics.txt indicate that increasing HTML percentage is not a good indicator of an increasing chance of spam.

As a very crude statistical measure, take a look at the S/O's here in 2.6x's STATISTICS.txt.. It's mostly increasing, but note the slight dip when you get to 80_90.

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
  1.327   0.9723   1.9908    0.328   0.03    0.00  HTML_00_10
  1.934   2.1962   1.4436    0.603   0.21    0.00  HTML_10_20
  3.458   4.6458   1.2322    0.790   0.48    0.69  HTML_20_30
  4.132   5.8439   0.9245    0.863   0.62    0.84  HTML_30_40
  6.525   9.4988   0.9541    0.909   0.73    0.87  HTML_40_50
 11.599  17.0960   1.3041    0.929   0.79    0.70  HTML_50_60
 13.701  20.3955   1.1619    0.946   0.83    0.36  HTML_60_70
 10.555  15.8318   0.6713    0.959   0.86    0.38  HTML_70_80
  5.810   8.6467   0.4974    0.946   0.82    0.01  HTML_80_90
  0.940   1.4251   0.0317    0.978   0.89    0.31  HTML_90_100


It'd be interesting to run your tests against a good sized corpus with mass-check.. if for no other reason than to see what the S/O curve looks like.





------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to