Re: Correct syntax for pathological re.search()
On 2024-10-11 17:13:07 -0400, AVI GROSS via Python-list wrote: > Is there some utility function out there that can be called to show what the > regular expression you typed in will look like by the time it is ready to be > used? I assume that by "ready to be used" you mean the compiled form? No, there doesn't seem to be a way to dump that. You can p = re.compile("sout{") print(p.pattern) but that just prints the input string, which you could do without compiling it first. But - without having looked at the implementation - it's far from clear that the compiled form would be useful to the user. It's probably some kind of state machine, and a large table of state transitions isn't very readable. There are a number of websites which visualize regular expressions. Those are probably better for debugging a regular expression than anything the re module could reasonably produce (although with the caveat that such a web site would use a different implementation and therefore might produce different results). hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature -- https://mail.python.org/mailman/listinfo/python-list
RE: Correct syntax for pathological re.search()
Peter, Matthew understood what I was hinting at in one way and you in another. The question asked how to add some power of two backslashes or make other changes, so the RE functionality sees what you want. The goal is to see what happens when one or more intermediate evaluations may change the string. So, a simple print may suffice as a parallel way to force the same evaluations. Thomas made his point. And, I am starting to feel like I need to change my name to something like Luke since this discussion must be gospel. FYI, I was not planning on posting at all. Time to detach. -Original Message- From: Python-list On Behalf Of Peter J. Holzer via Python-list Sent: Saturday, October 12, 2024 7:00 AM To: python-list@python.org Subject: Re: Correct syntax for pathological re.search() On 2024-10-11 17:13:07 -0400, AVI GROSS via Python-list wrote: > Is there some utility function out there that can be called to show what the > regular expression you typed in will look like by the time it is ready to be > used? I assume that by "ready to be used" you mean the compiled form? No, there doesn't seem to be a way to dump that. You can p = re.compile("sout{") print(p.pattern) but that just prints the input string, which you could do without compiling it first. But - without having looked at the implementation - it's far from clear that the compiled form would be useful to the user. It's probably some kind of state machine, and a large table of state transitions isn't very readable. There are a number of websites which visualize regular expressions. Those are probably better for debugging a regular expression than anything the re module could reasonably produce (although with the caveat that such a web site would use a different implementation and therefore might produce different results). hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" -- https://mail.python.org/mailman/listinfo/python-list
Re: Correct syntax for pathological re.search()
On 10/11/2024 8:37 PM, MRAB via Python-list wrote: On 2024-10-11 22:13, AVI GROSS via Python-list wrote: Is there some utility function out there that can be called to show what the regular expression you typed in will look like by the time it is ready to be used? Obviously, life is not that simple as it can go through multiple layers with each dealing with a layer of backslashes. But for simple cases, ... Yes. It's called 'print'. :-) There is section in the Python docs about this backslash subject. It's titled "The Backslash Plague" in https://docs.python.org/3/howto/regex.html You can also inspect the compiled expression to see what string it received after all the escaping: import re re_string = '\\w+sub' re_pattern = re.compile(re_string) # Should look as if we had used r'\w+\\sub' print(re_pattern.pattern) \w+\\sub -Original Message- From: Python-list bounces+avi.e.gross=gmail@python.org> On Behalf Of Gilmeh Serda via Python-list Sent: Friday, October 11, 2024 10:44 AM To: python-list@python.org Subject: Re: Correct syntax for pathological re.search() On Mon, 7 Oct 2024 08:35:32 -0500, Michael F. Stemper wrote: I'm trying to discard lines that include the string "\sout{" (which is TeX, for those who are curious. I have tried: if not re.search("\sout{", line): if not re.search("\sout\{", line): if not re.search("\\sout{", line): if not re.search("\\sout\{", line): But the lines with that string keep coming through. What is the right syntax to properly escape the backslash and the left curly bracket? $ python Python 3.12.6 (main, Sep 8 2024, 13:18:56) [GCC 14.2.1 20240805] on linux Type "help", "copyright", "credits" or "license" for more information. import re s = r"testing \sout{WHADDEVVA}" re.search(r"\\sout{", s) You want a literal backslash, hence, you need to escape everything. It is not enough to escape the "\s" as "\\s", because that only takes care of Python's demands for escaping "\". You also need to escape the "\" for the RegEx as well, or it will read it like it means "\s", which is the RegEx for a space character and therefore your search doesn't match, because it reads it like you want to search for " out{". Therefore, you need to escape it either as per my example, or by using four "\" and no "r" in front of the first quote, which also works: re.search("sout{", s) You don't need to escape the curly braces. We call them "seagull wings" where I live. -- https://mail.python.org/mailman/listinfo/python-list
Re: Correct syntax for pathological re.search()
On 10/12/2024 6:59 AM, Peter J. Holzer via Python-list wrote: On 2024-10-11 17:13:07 -0400, AVI GROSS via Python-list wrote: Is there some utility function out there that can be called to show what the regular expression you typed in will look like by the time it is ready to be used? I assume that by "ready to be used" you mean the compiled form? No, there doesn't seem to be a way to dump that. You can p = re.compile("sout{") print(p.pattern) but that just prints the input string, which you could do without compiling it first. It prints the escaped version, so you can see if you escaped the string as you intended. In this case, the print will display '\\sout{'. That's worth something. But - without having looked at the implementation - it's far from clear that the compiled form would be useful to the user. It's probably some kind of state machine, and a large table of state transitions isn't very readable. There are a number of websites which visualize regular expressions. Those are probably better for debugging a regular expression than anything the re module could reasonably produce (although with the caveat that such a web site would use a different implementation and therefore might produce different results). hp -- https://mail.python.org/mailman/listinfo/python-list