-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello,
I just uploaded FuzzyOcr 2.3b to the download site. If you find bugs or run into problems, please mail back :) The major changes are: - - Added a configurable timeout (maximum runtime) for the plugin, to avoid any lockups/unwanted delays - - The default matching threshold (set in the config file) can now be overridden on a per-word basis in the wordlist An example, wordlist contains: word1 word2::0 word3::0.2 Then word1 is matched with the default threshold set in the config file, word2 must be an exact match (threshold 0), and word 3 is matched with a threshold of 0.2. This is especially useful for words which trigger false positives very often like: "penis", "money" or "news". Note that the tendency to produce a FP is not directly connected to the word length. The word "buy" produces very few FP compared to "penis", when both are being matched with the same threshold. The FuzzyOcr.words.sample contains some suggestions for word specific thresholds which I recommend. - - The experimental MD5 database has been replaced by a custom hash database which is able to match very similar images. Often, you get the same image twice, or all your customers get the same spam mail. But even though the pictures look the same, they are not identical. That is why MD5 was useless. The newly introduced hash (self invented) is able to recognize almost identical images based on features that I won't explain here as it would make it easier for spammers :) If a message contains a picture previously registered in the database, the original score is reread from the database and the message is immediatly tagged with this score and the plugin ends. - - Some non-alpha->alpha translations are now used on the gocr output, that fix common mistakes, like "i" being misread as ";" or "a" as "8". - - There are now 2 scores for broken images, one is used when the picture is recognized as broken, but giffix was able to correct the errors and it gave some output that can be scanned, the other one is used if the image is unfixable (that means either too broken, or interlaced/animated and broken). The first one is set lower than the second one (2.5 vs. 5). - -Various bugfixes TODO: - -Write an external program to manage the database (add, remove and verify given pictures). - -Rewrite the temp file system to do all external program operations on files (saves memory). Another wish: I'd like to create a database to ship with the plugin so it can be used out of the box but I do not have much samples here, so it would be nice if you sent me picture samples of common picture spam you get with "[picture sample]" in the subject to my mail address. I will post here again if I got enough :). Thanks to Jorge Valdes, Michael Alan Dorman and UxBoD for finding bugs and sending improvement suggestions for this version Chris -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE72jaJQIKXnJyDxURApfeAJ47JcACEeIaYtEA8z6wDdFxGPhrUgCZAZSE sdWROYeF8IFdbUX0njAdV+o= =y7XM -----END PGP SIGNATURE-----