Re: Correct syntax for pathological re.search()

2024-10-12 Thread Peter J. Holzer via Python-list
On 2024-10-11 17:13:07 -0400, AVI GROSS via Python-list wrote:
> Is there some utility function out there that can be called to show what the
> regular expression you typed in will look like by the time it is ready to be
> used?

I assume that by "ready to be used" you mean the compiled form?

No, there doesn't seem to be a way to dump that. You can

p = re.compile("sout{")
print(p.pattern)

but that just prints the input string, which you could do without
compiling it first.

But - without having looked at the implementation - it's far from clear
that the compiled form would be useful to the user. It's probably some
kind of state machine, and a large table of state transitions isn't very
readable.

There are a number of websites which visualize regular expressions.
Those are probably better for debugging a regular expression than
anything the re module could reasonably produce (although with the
caveat that such a web site would use a different implementation and
therefore might produce different results).

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Correct syntax for pathological re.search()

2024-10-12 Thread AVI GROSS via Python-list
Peter,

Matthew understood what I was hinting at in one way and you in another.

The question asked how to add some power of two backslashes or make other
changes, so the RE functionality sees what you want. The goal is to see what
happens when one or more intermediate evaluations may change the string.

So, a simple print may suffice as a parallel way to force the same
evaluations. 

Thomas made his point. And, I am starting to feel like I need to change my
name to something like Luke since this discussion must be gospel.

FYI, I was not planning on posting at all. Time to detach.


-Original Message-
From: Python-list  On
Behalf Of Peter J. Holzer via Python-list
Sent: Saturday, October 12, 2024 7:00 AM
To: python-list@python.org
Subject: Re: Correct syntax for pathological re.search()

On 2024-10-11 17:13:07 -0400, AVI GROSS via Python-list wrote:
> Is there some utility function out there that can be called to show what
the
> regular expression you typed in will look like by the time it is ready to
be
> used?

I assume that by "ready to be used" you mean the compiled form?

No, there doesn't seem to be a way to dump that. You can

p = re.compile("sout{")
print(p.pattern)

but that just prints the input string, which you could do without
compiling it first.

But - without having looked at the implementation - it's far from clear
that the compiled form would be useful to the user. It's probably some
kind of state machine, and a large table of state transitions isn't very
readable.

There are a number of websites which visualize regular expressions.
Those are probably better for debugging a regular expression than
anything the re module could reasonably produce (although with the
caveat that such a web site would use a different implementation and
therefore might produce different results).

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Correct syntax for pathological re.search()

2024-10-12 Thread Thomas Passin via Python-list

On 10/11/2024 8:37 PM, MRAB via Python-list wrote:

On 2024-10-11 22:13, AVI GROSS via Python-list wrote:
Is there some utility function out there that can be called to show 
what the
regular expression you typed in will look like by the time it is ready 
to be

used?

Obviously, life is not that simple as it can go through multiple 
layers with

each dealing with a layer of backslashes.

But for simple cases, ...


Yes. It's called 'print'. :-)


There is section in the Python docs about this backslash subject.  It's 
titled "The Backslash Plague" in


https://docs.python.org/3/howto/regex.html

You can also inspect the compiled expression to see what string it 
received after all the escaping:



import re

re_string = '\\w+sub'
re_pattern = re.compile(re_string)

# Should look as if we had used r'\w+\\sub'
print(re_pattern.pattern)

\w+\\sub



-Original Message-
From: Python-list bounces+avi.e.gross=gmail@python.org> On

Behalf Of Gilmeh Serda via Python-list
Sent: Friday, October 11, 2024 10:44 AM
To: python-list@python.org
Subject: Re: Correct syntax for pathological re.search()

On Mon, 7 Oct 2024 08:35:32 -0500, Michael F. Stemper wrote:


I'm trying to discard lines that include the string "\sout{" (which is
TeX, for those who are curious. I have tried:
   if not re.search("\sout{", line): if not re.search("\sout\{", line):
   if not re.search("\\sout{", line): if not re.search("\\sout\{",
   line):

But the lines with that string keep coming through. What is the right
syntax to properly escape the backslash and the left curly bracket?


$ python
Python 3.12.6 (main, Sep  8 2024, 13:18:56) [GCC 14.2.1 20240805] on 
linux

Type "help", "copyright", "credits" or "license" for more information.

import re
s = r"testing \sout{WHADDEVVA}"
re.search(r"\\sout{", s)



You want a literal backslash, hence, you need to escape everything.

It is not enough to escape the "\s" as "\\s", because that only takes 
care

of Python's demands for escaping "\". You also need to escape the "\" for
the RegEx as well, or it will read it like it means "\s", which is the
RegEx for a space character and therefore your search doesn't match,
because it reads it like you want to search for " out{".

Therefore, you need to escape it either as per my example, or by using
four "\" and no "r" in front of the first quote, which also works:


re.search("sout{", s)



You don't need to escape the curly braces. We call them "seagull wings"
where I live.





--
https://mail.python.org/mailman/listinfo/python-list


Re: Correct syntax for pathological re.search()

2024-10-12 Thread Thomas Passin via Python-list

On 10/12/2024 6:59 AM, Peter J. Holzer via Python-list wrote:

On 2024-10-11 17:13:07 -0400, AVI GROSS via Python-list wrote:

Is there some utility function out there that can be called to show what the
regular expression you typed in will look like by the time it is ready to be
used?


I assume that by "ready to be used" you mean the compiled form?

No, there doesn't seem to be a way to dump that. You can

 p = re.compile("sout{")
 print(p.pattern)

but that just prints the input string, which you could do without
compiling it first.


It prints the escaped version, so you can see if you escaped the string 
as you intended. In this case, the print will display '\\sout{'.  That's 
worth something.




But - without having looked at the implementation - it's far from clear
that the compiled form would be useful to the user. It's probably some
kind of state machine, and a large table of state transitions isn't very
readable.

There are a number of websites which visualize regular expressions.
Those are probably better for debugging a regular expression than
anything the re module could reasonably produce (although with the
caveat that such a web site would use a different implementation and
therefore might produce different results).

 hp




--
https://mail.python.org/mailman/listinfo/python-list