On Mon, Feb 14, 2022 at 05:13:38PM -0600, Tim Peters wrote:
> An interesting lesson nobody wants to learn: the original major
> string-processing language, SNOBOL, had powerful pattern matching but
> no regexps. Griswold's more modern successor language, Icon, found no
> reason to change that.
I've been interested in the existence of SNOBOL string scanning for
a long time, but I know very little about it.
How does it differ from regexes, and why have programming languages
pretty much standardised on regexes rather than other forms of string
matching?
> Naive regexps are both clumsy and prone to bad
> timing in many tasks that "should be" very easy to express. For
> example, "now match up to the next occurrence of 'X'". In SNOBOL and
> Icon, that's trivial. 75% of regexp users will write ".*X", with scant
> understanding that it may match waaaay more than they intended.
Indeed, I've been bitten by that many times :-)
> Another 20% will write ".*?X", with scant understanding that may
> extend beyond _just_ "the next" X in some cases.
But this surprises me. Do you have an example?
> That leaves the happy
> 5% who write "[^X]*X", which finally says what they intended from the
> start.
Doesn't that only work if X is literally a single character?
>>> import re
>>> string = "This is some spam and extra spam."
>>> re.search('[^spam]*spam', string)
<re.Match object; span=(11, 17), match='e spam'>
Whereas this seems to do what I expected:
>>> re.search('.*?spam', string)
<re.Match object; span=(0, 17), match='This is some spam'>
--
Steve
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/XDTMX2JUSGOBT4KNRSAGJT3BBPDY645Q/
Code of Conduct: http://python.org/psf/codeofconduct/