Re: Obfuscation (was: Millions and Billions)

Loren Wilton 28 Feb 2005 03:46:43 -0000

> > I just question whether regex's are the right "complicated solution".
> >
> > How does Google or one of the dictionary sites guess the correct
spelling
> > for a misspelled word?
>
> Great, why don't you go see if google can guess the correct spelling for
>
> c l @ L i @ s


He has a point.  A complicated regex is complicated, and that can mean slow.
It also by definition means "incomprehensible to humans", and so has to be
generated by a tool, and then not touched or looked at.

Since a tool can generate the matching pattern and convert it to a re, it
seems that a tool could in theory generate a matching pattern and convert it
to something else that might be either more comprehensible or more
efficient.  Or possibly a tool could be made that would do a direct fuzzy
match from the unobfuscated word.  (However, I think this last possibility
would be slower than pre-obfuscating; but possibly it wouldn't be.)

The problem is that perl doesn't have any syntax to efficiently describe
this obfuscated match other than an incomprehensible regex.

Someone could invent such a tool, and it could either be a plugin to SA or a
part (or addon subroutine) called by perl itself.  In fact I believe that at
least two fuzzy matching plugins have been added to SA in the last week.
Whether they are as efficient, or more efficient, than the current horrid
re's is an interesting question.

        Loren

Re: Obfuscation (was: Millions and Billions)

Reply via email to