On Wed, 23 Mar 2005, Andrew Gaffney wrote:
> I'm trying to come up with a regex for my IRC bot that detects 1337
> (in order to kick them from the channel).
For those unfamiliar with 'leet, see here:
<http://www.microsoft.com/athome/security/children/kidtalk.mspx>
<http://en.wikipedia.org/wiki/Leetspeak>
<http://www.straightdope.com/columns/030110.html>
> I can't seem to come up with one that will have few false positives
> but also work most of the time. Has anyone done something like this
> before? Does anyone have any suggestions?
I strongly suspect that there is no general solution for this.
The problem is that the set you're trying to match against is completely
unbounded, and the whole point of 'leet is to be unconventional with
rules for spelling, grammar, diction, courtesy, etc.
You could go halfway with code to catch the most common terms -- 1337,
w00t, pr0n, warez, 0\/\/n3d, etc -- but note how dissimilar those are.
* One is all numbers, while another is all letters, so they both look
like normal text.
* You could consider a rule to catch ones with mixed numbers & letters,
but that would catch legit terms like "perl6", "md5", or "mp3".
* One mixes in punctuation, so now you have to deal with anywhere that
alphanumeric characters are adjacent to symbols. Like, for example,
everywhere you have a comma, a hyphentated-word, or: a period. Nuts!
Ultimately, you can't win. If the users can guess what the matching
patterns might be -- and remember, this is IRC, so assume that they'll
talk to each other as they figure things out -- then they can *always*
come up with text that will get around your filters.
The most reasonable approach is probably to set up some hard-coded rules
for the most common terms -- see the URLs above for examples -- and some
very broad rules to warn (but *not* kick) possible offenders, and with
that have actual human moderators to catch whatever slips through.
Anything more aggressive than that and you're going to be buried in a
pile of false positives & false negatives... :-/
--
Chris Devers
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>