On Fri, 03 Jun 2011 05:51:18 -0700, ru...@yahoo.com wrote: > On 06/02/2011 07:21 AM, Neil Cerutti wrote:
>> > Python's str methods, when they're sufficent, are usually more >> > efficient. > > Unfortunately, except for the very simplest cases, they are often not > sufficient. Maybe so, but the very simplest cases occur very frequently. > I often find myself changing, for example, a startwith() to > a RE when I realize that the input can contain mixed case Why wouldn't you just normalise the case? source.lower().startswith(prefix.lower()) Particularly if the two strings are short, this is likely to be much faster than a regex. Admittedly, normalising the case in this fashion is not strictly correct. It works well enough for ASCII text, and probably Latin-1, but for general Unicode, not so much. But neither will a regex solution. If you need to support true case normalisation for arbitrary character sets, Python isn't going to be much help for you. But for the rest of us, a simple str.lower() or str.upper() might be technically broken but it will do the job. > or that I have > to treat commas as well as spaces as delimiters. source.replace(",", " ").split(" ") [steve@sylar ~]$ python -m timeit -s "source = 'a b c,d,e,f,g h i j k'" "source.replace(',', ' ').split(' ')" 100000 loops, best of 3: 2.69 usec per loop [steve@sylar ~]$ python -m timeit -s "source = 'a b c,d,e,f,g h i j k'" - s "import re" "re.split(',| ', source)" 100000 loops, best of 3: 11.8 usec per loop re.split is about four times slower than the simple solution. > After doing this a > number of times, one starts to use an RE right from the get go unless > one is VERY sure that there will be no requirements creep. YAGNI. There's no need to use a regex just because you think that you *might*, someday, possibly need a regex. That's just silly. If and when requirements change, then use a regex. Until then, write the simplest code that will solve the problem you have to solve now, not the problem you think you might have to solve later. > And to regurgitate the mantra frequently used to defend Python when it > is criticized for being slow, the real question should be, are REs fast > enough? The answer almost always is yes. Well, perhaps so. [...] > In short, although your observations are true to some extent, they > are not sufficient to justify the anti-RE attitude often seen here. I don't think that there's really an *anti* RE attitude here. It's more a skeptical, cautious attitude to them, as a reaction to the Perl "when all you have is a hammer, everything looks like a nail" love affair with regexes. There are a few problems with regexes: - they are another language to learn, a very cryptic a terse language; - hence code using many regexes tends to be obfuscated and brittle; - they're over-kill for many simple tasks; - and underpowered for complex jobs, and even some simple ones; - debugging regexes is a nightmare; - they're relatively slow; - and thanks in part to Perl's over-reliance on them, there's a tendency among many coders (especially those coming from Perl) to abuse and/or misuse regexes; people react to that misuse by treating any use of regexes with suspicion. But they have their role to play as a tool in the programmers toolbox. Regarding their syntax, I'd like to point out that even Larry Wall is dissatisfied with regex culture in the Perl community: http://www.perl.com/pub/2002/06/04/apo5.html -- Steven -- http://mail.python.org/mailman/listinfo/python-list