-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Loren Wilton writes:
> Since a tool can generate the matching pattern and convert it to a re, it
> seems that a tool could in theory generate a matching pattern and convert it
> to something else that might be either more comprehensible or more
> efficient.  Or possibly a tool could be made that would do a direct fuzzy
> match from the unobfuscated word.  (However, I think this last possibility
> would be slower than pre-obfuscating; but possibly it wouldn't be.)
> 
> The problem is that perl doesn't have any syntax to efficiently describe
> this obfuscated match other than an incomprehensible regex.
> 
> Someone could invent such a tool, and it could either be a plugin to SA or a
> part (or addon subroutine) called by perl itself.  In fact I believe that at
> least two fuzzy matching plugins have been added to SA in the last week.
> Whether they are as efficient, or more efficient, than the current horrid
> re's is an interesting question.

they actually generate the horrid REs internally. ;)

A paper at the spam conference suggested using an Edit Distance algorithm
with very good results; the idea being, the edit distance from "cialis" to
"C 1 a l | s" isn't as far as it is to "specialized" or so on.

if I recall correctly, someone submitted an implementation quite a while
ago on our BZ, but I think the FP rates were too high.   Given the
recent paper's published results, though, it may be there are good ways
to tweak it to get FPs at a tolerable rate.

If anyone wants to have a try, please do ;)

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFCIzn1MJF5cimLx9ARAoOLAKCoLQ4ZU+tPC0KyUM3guiSm0+XZtACfUPZd
io3eGt5cQ877idv3GGvl9QE=
=JVno
-----END PGP SIGNATURE-----

Reply via email to