On Fri, 8 Sep 2006, Randal, Phil wrote:
Score appropriately, train your Bayes well, and the false positives should diminish.
FUZZY_OCR gives crazily high scores to certain things. One point per matched keyword, I believe. I've seen FUZZY_OCR, by itself, give scores as high as 24.00. Here's the distribution from one of my log files, as a matter of fact: score count ----- ----- 4.00: 21 5.00: 9 6.00: 6 7.00: 4 8.00: 4 9.00: 15 10.00: 7 11.00: 7 13.00: 1 14.00: 1 24.00: 1 As you can see, 24 only happened once, but 9, 10, and 11 are very common. So yeah, false positives should diminish some, but there is no way a BAYES_00 is going to make up for a score of 11. Personally, I think the scoring strategy for FUZZY_OCR needs to be revamped... - Logan