-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Loren Wilton writes: > Since a tool can generate the matching pattern and convert it to a re, it > seems that a tool could in theory generate a matching pattern and convert it > to something else that might be either more comprehensible or more > efficient. Or possibly a tool could be made that would do a direct fuzzy > match from the unobfuscated word. (However, I think this last possibility > would be slower than pre-obfuscating; but possibly it wouldn't be.) > > The problem is that perl doesn't have any syntax to efficiently describe > this obfuscated match other than an incomprehensible regex. > > Someone could invent such a tool, and it could either be a plugin to SA or a > part (or addon subroutine) called by perl itself. In fact I believe that at > least two fuzzy matching plugins have been added to SA in the last week. > Whether they are as efficient, or more efficient, than the current horrid > re's is an interesting question. they actually generate the horrid REs internally. ;) A paper at the spam conference suggested using an Edit Distance algorithm with very good results; the idea being, the edit distance from "cialis" to "C 1 a l | s" isn't as far as it is to "specialized" or so on. if I recall correctly, someone submitted an implementation quite a while ago on our BZ, but I think the FP rates were too high. Given the recent paper's published results, though, it may be there are good ways to tweak it to get FPs at a tolerable rate. If anyone wants to have a try, please do ;) - --j. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFCIzn1MJF5cimLx9ARAoOLAKCoLQ4ZU+tPC0KyUM3guiSm0+XZtACfUPZd io3eGt5cQ877idv3GGvl9QE= =JVno -----END PGP SIGNATURE-----