On Fri, 8 Sep 2006, Randal, Phil wrote:
Score appropriately, train your Bayes well, and the false positives
should diminish.

FUZZY_OCR gives crazily high scores to certain things.
One point per matched keyword, I believe.  I've seen FUZZY_OCR,
by itself, give scores as high as 24.00.

Here's the distribution from one of my log files, as a matter
of fact:

        score count
        ----- -----
         4.00: 21
         5.00: 9
         6.00: 6
         7.00: 4
         8.00: 4
         9.00: 15
        10.00: 7
        11.00: 7
        13.00: 1
        14.00: 1
        24.00: 1

As you can see, 24 only happened once, but 9, 10, and 11 are
very common.

So yeah, false positives should diminish some, but there is
no way a BAYES_00 is going to make up for a score of 11.

Personally, I think the scoring strategy for FUZZY_OCR needs
to be revamped...

  - Logan

Reply via email to