John Machin wrote:
On Feb 8, 10:15 am, MRAB <goo...@mrabarnett.plus.com> wrote:
John Machin wrote:
On Feb 8, 1:37 am, MRAB <goo...@mrabarnett.plus.com> wrote:
LaundroMat wrote:
Hi,
I'm quite new to regular expressions, and I wonder if anyone here
could help me out.
I'm looking to split strings that ideally look like this: "Update: New
item (Household)" into a group.
This expression works ok: '^(Update:)?(.*)(\(.*\))$' - it returns
("Update", "New item", "(Household)")
Some strings will look like this however: "Update: New item (item)
(Household)". The expression above still does its job, as it returns
("Update", "New item (item)", "(Household)").
Not quite true; it actually returns
    ('Update:', ' New item (item) ', '(Household)')
However ignoring the difference in whitespace, the OP's intention is
clear. Yours returns
    ('Update:', ' New item ', '(item) (Household)')
The OP said it works OK, which I took to mean that the OP was OK with
the extra whitespace, which can be easily stripped off. Close enough!

As I said, the whitespace difference [between what the OP said his
regex did and what it actually does] is not the problem. The problem
is that the OP's "works OK" included (item) in the 2nd group, whereas
yours includes (item) in the 3rd group.

Ugh, right again!

That just shows what happens when I try to post while debugging! :-)

It does not work however when there is no text in parentheses (eg
"Update: new item"). How can I get the expression to return a tuple
such as ("Update:", "new item", None)?
You need to make the last group optional and also make the middle group
lazy: r'^(Update:)?(.*?)(?:(\(.*\)))?$'.
Why do you perpetuate the redundant ^ anchor?
The OP didn't say whether search() or match() was being used. With the ^
it doesn't matter.

It *does* matter. re.search() is suboptimal; after failing at the
first position, it's not smart enough to give up if the pattern has a
front anchor.

[win32, 2.6.1]
C:\junk>\python26\python -mtimeit -s"import re;rx=re.compile
('^frobozz');txt=100
*'x'" "assert not rx.match(txt)"
1000000 loops, best of 3: 1.17 usec per loop

C:\junk>\python26\python -mtimeit -s"import re;rx=re.compile
('^frobozz');txt=100
0*'x'" "assert not rx.match(txt)"
1000000 loops, best of 3: 1.17 usec per loop

C:\junk>\python26\python -mtimeit -s"import re;rx=re.compile
('^frobozz');txt=100
*'x'" "assert not rx.search(txt)"
100000 loops, best of 3: 4.37 usec per loop

C:\junk>\python26\python -mtimeit -s"import re;rx=re.compile
('^frobozz');txt=100
0*'x'" "assert not rx.search(txt)"
10000 loops, best of 3: 34.1 usec per loop

Corresponding figures for 3.0 are 1.02, 1.02, 3.99, and 32.9

On my PC the numbers for Python 2.6 are:

C:\Python26>python -mtimeit -s"import re;rx=re.compile('^frobozz');txt=100*'x'" "assert not rx.match(txt)"
1000000 loops, best of 3: 1.02 usec per loop

C:\Python26>python -mtimeit -s"import re;rx=re.compile('^frobozz');txt=1000*'x'" "assert not rx.match(txt)"
1000000 loops, best of 3: 1.04 usec per loop

C:\Python26>python -mtimeit -s"import re;rx=re.compile('^frobozz');txt=100*'x'" "assert not rx.search(txt)"
100000 loops, best of 3: 3.69 usec per loop

C:\Python26>python -mtimeit -s"import re;rx=re.compile('^frobozz');txt=1000*'x'" "assert not rx.search(txt)"
10000 loops, best of 3: 28.6 usec per loop

I'm currently working on the re module and I've fixed that problem:

C:\Python27>python -mtimeit -s"import re;rx=re.compile('^frobozz');txt=100*'x'" "assert not rx.match(txt)"
1000000 loops, best of 3: 1.28 usec per loop

C:\Python27>python -mtimeit -s"import re;rx=re.compile('^frobozz');txt=1000*'x'" "assert not rx.match(txt)"
1000000 loops, best of 3: 1.23 usec per loop

C:\Python27>python -mtimeit -s"import re;rx=re.compile('^frobozz');txt=100*'x'" "assert not rx.search(txt)"
1000000 loops, best of 3: 1.21 usec per loop

C:\Python27>python -mtimeit -s"import re;rx=re.compile('^frobozz');txt=1000*'x'" "assert not rx.search(txt)"
1000000 loops, best of 3: 1.21 usec per loop

Hmm. Needs more tweaking...
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to