-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,


I just uploaded FuzzyOcr 2.3b to the download site. If you find bugs
or run into problems, please mail back :)

The major changes are:

- - Added a configurable timeout (maximum runtime) for the plugin, to
avoid any lockups/unwanted delays
- - The default matching threshold (set in the config file) can now be
overridden on a per-word basis in the wordlist

    An example, wordlist contains:

    word1
    word2::0
    word3::0.2


    Then word1 is matched with the default threshold set in the config
    file,
    word2 must be an exact match (threshold 0), and word 3 is matched
    with a threshold of 0.2.

    This is especially useful for words which trigger false positives
    very often like: "penis", "money" or "news".

    Note that the tendency to produce a FP is not directly connected
    to the word length.
    The word "buy" produces very few FP compared to "penis", when both
    are being matched with the same threshold.

    The FuzzyOcr.words.sample contains some suggestions for word
    specific thresholds which I recommend.

- - The experimental MD5 database has been replaced by a custom hash
database which is able to match very similar images.

    Often, you get the same image twice, or all your customers get the
    same spam mail. But even though the pictures look the same, they
    are not identical. That is why MD5 was useless. The newly
    introduced hash (self invented) is able to recognize almost
    identical images based on features that I won't explain here as it
    would make it easier for spammers :)
    If a message contains a picture previously registered in the
    database, the original score is reread from the database and the
    message is immediatly tagged with this score and the plugin ends.

- - Some non-alpha->alpha translations are now used on the gocr output,
that fix common mistakes, like "i" being misread as ";" or "a" as "8".

- - There are now 2 scores for broken images, one is used when the
picture is recognized as broken, but giffix was able to correct the
errors and it gave some output that can be scanned, the other one is
used if the image is unfixable (that means either too broken, or
interlaced/animated and broken). The first one is set lower than the
second one (2.5 vs. 5).

- -Various bugfixes

TODO:

- -Write an external program to manage the database (add, remove and
verify given pictures).
- -Rewrite the temp file system to do all external program operations on
files (saves memory).


Another wish: I'd like to create a database to ship with the plugin so
it can be used out of the box but I do not have much samples here, so
it would be nice if you sent me picture samples of common picture spam
you get with "[picture sample]" in the subject to my mail address. I
will post here again if I got enough :).


Thanks to Jorge Valdes, Michael Alan Dorman and UxBoD for finding bugs
and sending improvement suggestions for this version

Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE72jaJQIKXnJyDxURApfeAJ47JcACEeIaYtEA8z6wDdFxGPhrUgCZAZSE
sdWROYeF8IFdbUX0njAdV+o=
=y7XM
-----END PGP SIGNATURE-----

Reply via email to