-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Theo Van Dinter wrote:
> On Mon, Aug 14, 2006 at 08:46:51PM +0200, decoder wrote:
>> gocr features a nice parameter called -d. It is able to remove
>> smaller particles before scanning, compare these results:
>
> So my problem with the OCR idea is that it inevitably gets to the
> point where we'd need to programatically solve the same graphics as
> used in CAPTCHAs, and then I don't think we're really focused on
> addressing the core issue any longer.
>
> It's mostly the same way in non-graphic spams -- catching the text
> may or may not be difficult with all the obfuscation and such that
> goes on. However, catching the fact that there's obfuscation is a
> good indication of spam.
>
> Just a thought.
>
You are absolutely right, this COULD get to a point where it gets
really pointless to scan for text in an image. But for an image it is
even harder to detect an obfuscation, than with text.

For text, I had the idea earlier to utilize a method to detect
obfuscations with approximate matching and then scoring the
obfuscation itself and not the content. But this can lead easily to
false positives, so one must pay attention on what he puts on the
wordlist.

For images, this is even harder, how would one try to recognize an
attempt to mislead OCR?


Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE4Mh0JQIKXnJyDxURAgHTAJ9gL6EoSaWpcFjBWJVwg6zk+MJoIgCgomov
HWbHnKbbJovLuXwRtOhf2kc=
=vez+
-----END PGP SIGNATURE-----

Reply via email to