-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Theo Van Dinter wrote: > On Mon, Aug 14, 2006 at 08:46:51PM +0200, decoder wrote: >> gocr features a nice parameter called -d. It is able to remove >> smaller particles before scanning, compare these results: > > So my problem with the OCR idea is that it inevitably gets to the > point where we'd need to programatically solve the same graphics as > used in CAPTCHAs, and then I don't think we're really focused on > addressing the core issue any longer. > > It's mostly the same way in non-graphic spams -- catching the text > may or may not be difficult with all the obfuscation and such that > goes on. However, catching the fact that there's obfuscation is a > good indication of spam. > > Just a thought. > You are absolutely right, this COULD get to a point where it gets really pointless to scan for text in an image. But for an image it is even harder to detect an obfuscation, than with text.
For text, I had the idea earlier to utilize a method to detect obfuscations with approximate matching and then scoring the obfuscation itself and not the content. But this can lead easily to false positives, so one must pay attention on what he puts on the wordlist. For images, this is even harder, how would one try to recognize an attempt to mislead OCR? Chris -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE4Mh0JQIKXnJyDxURAgHTAJ9gL6EoSaWpcFjBWJVwg6zk+MJoIgCgomov HWbHnKbbJovLuXwRtOhf2kc= =vez+ -----END PGP SIGNATURE-----