On Fri, 8 Sep 2006, Michael Grey wrote:
We are testing a new configuration using FuzzyOCR, and found it to work very
well overall...
However, there have been two occasions in the last 24 hrs where screenshots
embedded into the emails caused false positives.
One was an 'account summary' from a cell company, the other was some internal
marketing info.
Are there other approaches to getting certain images white listed if they
contain, say, our specific company name ?
You could probably hack FuzzyOcr.pm pretty easily.
The basic strategy would be to create another list just like
@words, but with whitelist words instead. You should be able
to duplicate the code where it parses config file options (look
for "focr_word") and put in your own config file option, say
"focr_word_whitelist". Then at the bottom, there is a foreach
loop that iterates through @words and looks for matches.
You can just duplicate that loop and create a separate count
of whitelist words matched. Then modify the way the score is
computed (the "my $score = ...") line, and you're done.
- Logan