[Tim]
>>> That leaves the happy 5% who write "[^X]*X", which
>>> finally says what they intended from the start.
[Steven]
>> Doesn't that only work if X is literally a single character?
RIght. It was an examp[e, not a meta-example. Even for a _single
character_, "match up to the next, but never more or less than that"
is a puzzle for most regexp users.
[Chris]
> Yes, but if X is actually "spam", then you can probably do other
> assertions to guarantee the right match. It gets pretty clunky though.
Assertions aren't needed, but it is nightmarish to get right.
(|[^s]|s(|[^p]|p(|[^a]|a(|[^m]))))*spam
The "spam" at the end is the only obvious part ;-)
Before then, we match 0 or more instances of
nothing
or not 's'
or 's' followed by
nothing
or not 'p'
or 'p' followed by
nothing
or not 'a'
or 'a' followed by
nothing
or not 'm'
"spam" itself can't get through that maze, so backtracking into it
after its first match can't consume the matched "spam" to find a later
one.
In SNOBOL, as I recall, it could be spelled
ARB "spam" FENCE
Those are all pattern objects, and infix whitespace is a binary
pattern catenation operator.
ARB is a builtin pattern that matches the empty string at first, and
extends what it matches by one character each time it's backtracked
into.
"spam" matches the obvious string.
Then FENCE is a builtin pattern that matches an empty string, but acts
as a backtracking barrier: if the overall match attempt fails,
backtracking will not move "to the left" of FENCE. So, here, ARB will
not get a chance to consume more characters after the leftmost "spam"
is found.
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/N6OIYFVNNOJUCUKOM2WJKVCKGMLH5IIQ/
Code of Conduct: http://python.org/psf/codeofconduct/