-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello there,

I have improved the original OcrPlugin (found at
http://wiki.apache.org/spamassassin/OcrPlugin), so it contains fuzzy
matching. Like that, mistakes made by the OCR recognition or
intentional obfuscations in the text don't make the recognition
impossible. This is being done with a relative distance calculation
between the pattern (word from a given word list) and a line in the
recognized input. Also, the plugin uses dynamic scoring (more matched
words means more score, this can be adjusted in the source).

You can find a full description and an example in the wiki under:

http://wiki.apache.org/spamassassin/FuzzyOcrPlugin


Ideas for improvements or critics are always welcome :)


Best regards,


Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE18IMJQIKXnJyDxURAm4PAJ9WcLtEDharV99qZrgPGuy0oa6a+QCfcvgz
azeW1/azOeGFnW2qBnvcOUs=
=KZIA
-----END PGP SIGNATURE-----

Reply via email to