On 06/03/2011 08:25 AM, Steven D'Aprano wrote: > On Fri, 03 Jun 2011 05:51:18 -0700, ru...@yahoo.com wrote: > >> On 06/02/2011 07:21 AM, Neil Cerutti wrote: > >>> > Python's str methods, when they're sufficent, are usually more >>> > efficient. >> >> Unfortunately, except for the very simplest cases, they are often not >> sufficient. > > Maybe so, but the very simplest cases occur very frequently.
Right, and I stated that. >> I often find myself changing, for example, a startwith() to >> a RE when I realize that the input can contain mixed case > > Why wouldn't you just normalise the case? Because some of the text may be case-sensitive. >[...] >> or that I have >> to treat commas as well as spaces as delimiters. > > source.replace(",", " ").split(" ") Uhgg. create a whole new string just so you can split it on one rather than two characters? Sorry, but I find re.split ('[ ,]', source) states much more clearly exactly what is being done with no obfuscation. Obviously this is a simple enough case that the difference is minor but when the pattern gets only a little more complex, the clarity difference becomes greater. >[...] > re.split is about four times slower than the simple solution. If this processing is a bottleneck, by all means use a more complex hard-coded replacement for a regex. In most cases that won't be necessary. >> After doing this a >> number of times, one starts to use an RE right from the get go unless >> one is VERY sure that there will be no requirements creep. > > YAGNI. IAHNI. (I actually have needed it.) > There's no need to use a regex just because you think that you *might*, > someday, possibly need a regex. That's just silly. If and when > requirements change, then use a regex. Until then, write the simplest > code that will solve the problem you have to solve now, not the problem > you think you might have to solve later. I would not recommend you use a regex instead of a string method solely because you might need a regex later. But when you have to spend 10 minutes writing a half-dozen lines of python versus 1 minute writing a regex, your evaluation of the possibility of requirements changing should factor into your decision. > [...] >> In short, although your observations are true to some extent, they >> are not sufficient to justify the anti-RE attitude often seen here. > > I don't think that there's really an *anti* RE attitude here. It's more a > skeptical, cautious attitude to them, as a reaction to the Perl "when all > you have is a hammer, everything looks like a nail" love affair with > regexes. Yes, as I said, the regex attitude here seems in large part to be a reaction to their frequent use in Perl. It seems anti- to me in that I often see cautions about their use but seldom see anyone pointing out that they are often a better solution than a mass of twisty little string methods and associated plumbing. > There are a few problems with regexes: > > - they are another language to learn, a very cryptic a terse language; Chinese is cryptic too but there are a few billion people who don't seem to be bothered by that. > - hence code using many regexes tends to be obfuscated and brittle; No. With regexes the code is likely to be less brittle than a dozen or more lines of mixed string functions, indexes, and conditionals. > - they're over-kill for many simple tasks; > - and underpowered for complex jobs, and even some simple ones; Right, like all tools (including Python itself) they are suited best for a specific range of problems. That range is quite wide. > - debugging regexes is a nightmare; Very complex ones, perhaps. "Nightmare" seems an overstatement. > - they're relatively slow; So is Python. In both cases, if it is a bottleneck then choosing another tool is appropriate. > - and thanks in part to Perl's over-reliance on them, there's a tendency > among many coders (especially those coming from Perl) to abuse and/or > misuse regexes; people react to that misuse by treating any use of > regexes with suspicion. So you claim. I have seen more postings in here where REs were not used when they would have simplified the code, then I have seen regexes used when a string method or two would have done the same thing. > But they have their role to play as a tool in the programmers toolbox. We agree. > Regarding their syntax, I'd like to point out that even Larry Wall is > dissatisfied with regex culture in the Perl community: > > http://www.perl.com/pub/2002/06/04/apo5.html You did see the very first sentence in this, right? "Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 05 for the latest information." (Note that "Apocalypse" is referring to a series of Perl design documents and has nothing to do with regexes in particular.) Synopsis 05 is (AFAICT with a quick scan) a proposal for revising regex syntax. I didn't see anything about de-emphasizing them in Perl. (But I have no idea what is going on for Perl 6 so I could be wrong about that.) As for the original reference, Wall points out a number of problems with regexes, mostly details of their syntax. For example that more frequently used non-capturing groups require more characters than less-frequently used capturing groups. Most of these criticisms seem irrelevant to the question of whether hard-wired string manipulation code or regexes should be preferred in a Python program. And for the few criticisms that are relevant, nobody ever said regexes were perfect. The problems are well known, especially on this list where we've all been told about them a million times. The fact that REs are not perfect does not make them not useful. We also know about Python's problems (slow, the GIL, excessively terse and poorly organized documentation, etc) but that hardly makes Python not useful. Finally he is talking about *revising* regex syntax (in part by replacing some magic character sequences with other "better" ones) beyond the core CS textbook forms. He was *not* AFAICT advocating using hard-wired string manipulation code in place of regexes. So it is hardly a condemnation of the concept of regexs, rather just the opposite. Perhaps you stopped reading after seeing his "regular expression culture is a mess" comment without trying to see what he meant by "culture" or "mess"? -- http://mail.python.org/mailman/listinfo/python-list