Cameron, The topic is now Regular Expressions and the sin tax. This is not exclusively a Python issue as everybody and even their grandmother uses it in various forms.
I remember early versions of RE were fairly simple and readable. It was a terse minilanguage that allowed fairly complex things to be done but was readable. You now encounter versions that make people struggle as countless extensions have been sloppily grafted on. Who ordered multiple uses where "?" is now used? As an example. Many places have sort of expanded the terseness and both made it more and also less legible. UNICODE made lots of older RE features not very useful as definitions of things like what whitespace can be and what a word boundary or contents might be are made so different that new constructs were added to hold them. But, if you are operating mainly on ASCII text, the base functionality is till in there and can be used fairly easily. Consider it a bit like other mini languages such as the print() variants that kept adding functionality by packing lots of info tersely so you specify you want a floating point number with so many digits and so on, and by the way, right justified in a wider field and if it is negative, so this. Great if you can still remember how to read it. I was reading a python book recently which kept using a suffix of !r and I finally looked it up. It seems to be asking print (or perhaps an f string) to use __repr__() if possible to get the representation of the object. Then I find out this is not really needed any more as the context now allows you to use something like {repr(val)) so a val!r is not the only and confusing way. These mini-languages each require you to learn their own rules and quirks and when you do, they can be powerful and intuitive, at least for the features you memorized and maybe use regularly. Now RE knowledge is the same and it ports moderately well between languages except when it doesn't. As has been noted, the people at PERL relied on it a lot and kept changing and extending it. Some Python functionality lets you specify if you want PERL style or other styles. But hiding your head in the sand is not always going to work for long. No, you do not need to use RE for simple cases. Mind you, that is when it is easiest to use it reliably. I read some books related to XML where much of the work had been done in non-UNIX land years ago and they often had other ways of doing things in their endless series of methods on validating a schema or declaring it so data is forced to match the declared objectives such as what type(s) each item can be or whether some fields must exist inside others or in a particular order, or say you can have only three of them and seeming endless other such things. And then, suddenly, someone has the idea to introduce the ability for you to specify many things using regular expressions and the oppressiveness (for me) lifts and many things can now be done trivially or that were not doable before. I had a similar experience in my SQL reading where adding the ability to do some pattern matching using a form of RE made life simpler. The fact is that the idea of complex pattern matching IS complex and any tool that lets you express it so fluidly will itself be complex. So, as some have mentioned, find a resource that helps you build a regular expression perhaps through menus, or one that verifies if one you created makes any sense or lets you enter test data and have it show you how it is matching or what to change to make it match differently. The multi-line version of RE may also be helpful as well as sometimes breaking up a bigger one into several smaller ones that your program uses in multiple phases. Python recently added new functionality called Structural Pattern Matching. You use a match statement with various cases that match patterns and if matched, execute some action. Here is one tutorial if needed: https://peps.python.org/pep-0636/ The point is that although not at all the same as a RE, we again have a bit of a mini-language that can be used fairly concisely to investigate a problem domain fairly quickly and efficiently and do things. It is an overlapping but different form of pattern matching. And, in languages that have long had similar ideas and constructs, people often cut back on using other constructs like an IF statement, and just used something like this! And consider this example as being vaguely like a bit of regular expression: match command.split(): case ["go", ("north" | "south" | "east" | "west")]: current_room = current_room.neighbor(...) Like it or not, our future in programming is likely to include more and more such aids along with headaches. Avi -----Original Message----- From: Python-list <python-list-bounces+avi.e.gross=gmail....@python.org> On Behalf Of Grant Edwards Sent: Wednesday, March 1, 2023 12:04 PM To: python-list@python.org Subject: Re: How to escape strings for re.finditer? On 2023-02-28, Cameron Simpson <c...@cskk.id.au> wrote: > Regexps are: > - cryptic and error prone (you can make them more readable, but the > notation is deliberately both terse and powerful, which means that > small changes can have large effects in behaviour); the "error prone" > part does not mean that a regexp is unreliable, but that writing one > which is _correct_ for your task can be difficult, The nasty thing is that writing one that _appears_ to be correct for your task is often fairly easy. It will work as you expect for the test cases you throw at it, but then fail in confusing ways when released into the "real world". If you're lucky, it fails frequently and obviously enough that you notice it right away. If you're not lucky, it will fail infrequently and subtly for many years to come. My rule: never use an RE if you can use the normal string methods (even if it takes a a few lines of code using them to replace a single RE). -- Grant -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list