On Fri, 09 Apr 2010 14:48:22 +1000, Lie Ryan wrote: > On 04/09/10 12:32, Dotan Cohen wrote: >>> Regexes do have their uses. It's a case of knowing when they are the >>> best approach and when they aren't. >> >> Agreed. The problems begin when the "when they aren't" is not >> recognised. > > But problems also arises when people are suggesting overly complex > series of built-in functions for what is better handled by regex.
What defines "overly complex"? For some reason, people seem to have the idea that pattern matching of strings must be a single expression, no matter how complicated the pattern they're trying to match. If we have a complicated task to do in almost any other field, we don't hesitate to write a function to do it, or even multiple functions: we break our code up into small, understandable, testable pieces. We recognise that a five-line function may very well be less complex than a one-line expression that does the same thing. But if it's a string pattern matching task, we somehow become resistant to the idea of writing a function and treat one-line expressions as "simpler", no matter how convoluted they become. It's as if we decided that every maths problem had to be solved by a single expression, no matter how complex, and invented a painfully terse language unrelated to normal maths syntax for doing so: # Calculate the roots of sin**2(3*x-y): result = me.compile("{^g.?+*y:h}|\Y^r&(?P:2+)|\w+(x&y)|[?#\s]").solve() That's not to say that regexes aren't useful, or that they don't have advantages. They are well-studied from a theoretical basis. You don't have to re-invent the wheel: the re module provides useful pattern matching functionality with quite good performance. One disadvantage is that you have to learn an entire new language, a language which is painfully terse and obfuscated, with virtually no support for debugging. Larry Wall has criticised the Perl regex syntax on a number of grounds: * things which look similar often are very different; * things which are commonly needed are long and verbose, while things which are rarely needed are short; * too much reliance on too few metacharacters; * the default is to treat whitespace around tokens as significant, instead of defaulting to verbose-mode for readability; * overuse of parentheses; * difficulty working with non-ASCII data; * insufficient abstraction; * even though regexes are source code in a regular expression language, they're treated as mere strings, even in Perl; and many others. http://dev.perl.org/perl6/doc/design/apo/A05.html As programming languages go, regular expressions -- even Perl's regular expressions on steroids -- are particularly low-level. It's the assembly language of pattern matching, compared to languages like Prolog, SNOBOL and Icon. These languages use patterns equivalent in power to Backus-Naur Form grammars, or context-free grammars, much more powerful and readable than regular expressions. But in any case, not all text processing problems are pattern-matching problems, and even those that are don't necessarily require the 30lb sledgehammer of regular expressions. I find it interesting to note that there is such a thing as "regex culture", as Larry Wall describes it. There seems to be a sort of programmers' machismo about solving problems via regexes, even when they're not the right tool for the job, and in the fewest number of characters possible. I think regexes have a bad reputation because of regex culture, and not just within Python circles either: http://echochamber.me/viewtopic.php?f=11&t=57405 For the record, I'm not talking about "Because It's There" regexes like this this 6343-character monster: http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html or these: http://mail.pm.org/pipermail/athens-pm/2003-January/000033.html http://blog.sigfpe.com/2007/02/modular-arithmetic-with-regular.html The fact that these exist at all is amazing and wonderful. And yes, I admire the Obfuscated C and Underhanded C contests too :) -- Steven -- http://mail.python.org/mailman/listinfo/python-list