On Thu, 24 Mar 2005, Randy W. Sims wrote: > The only problem with that is that a dictionary is required for > it to work because each "symbol" can have multiple translations.
Not only that -- a 'leet word could have multiple possible meanings. For example, "pwn" ("own") could just be a typo for "pawn". Any attempt to get back from a 'leet term to real word is going to be extremely prone to false positives & false negatives. You could cheat and assume a list of banned words and suspect words, and try to find probable correlations between the two sets, but that's logically wrong: you're starting from the conclusion that every word is probably banned, then digging through what you find until you get what you wanted. The false positive rate will be huge with such an approach, but it's about the only approach that has a chance of working at all. The problem of differentiating between 'leet and conventional English is very similar to the problem of detecting spam and "ham" email. In that case, you can use various approaches that do a decent guesstimate -- Bayesian statistical filters, various hard-wired heuristics, a cocktail of both approaches, etc -- but there's *always* going to be some level of both false negatives (spam or 'leet that gets through) and false positives (good messages that get blocked). This is unavoidable -- all you can do is make reasonable attempts to minimize it. Maybe the IRC bot should be hooked up to SpamAssassin :-) -- Chris Devers -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>