RE: How to escape strings for re.finditer?

avi.e.gross Mon, 27 Feb 2023 16:37:50 -0800

Just FYI, Jen, there are times a sledgehammer works but perhaps is not the only 
way. These days people worry less about efficiency and more about programmer 
time and education and that can be fine.

But it you looked at methods available in strings or in some other modules, 
your situation is quite common. Some may use another RE front end called 
finditer().

I am NOT suggesting you do what I say next, but imagine writing a loop that 
takes a substring of what you are searching for of the same length as your 
search string. Near the end, it stops as there is too little left.

You can now simply test your searched for string against that substring for 
equality and it tends to return rapidly when they are not equal early on.

Your loop would return whatever data structure or results you want such as that 
it matched it three times at offsets a, b and c.

But do you allow overlaps? If not, your loop needs to skip len(search_str) 
after a match.

What you may want to consider is another form of pre-processing. Do you care if 
"abc_degree + 1" has missing or added spaces at the tart or end or anywhere in 
middle as in " abc_degree +1"?

Do you care if stuff is a different case like "Abc_Degree + 1"?

Some such searches can be done if both the pattern and searched string are 
first converted to a canonical format that maps to the same output. But that 
complicates things a bit and you may to display what you match differently.

And are you also willing to match this: "myabc_degree + 1"?

When using a crafter RE there is a way to ask for a word boundary so abc will 
only be matched if before that is a space or the start of the string and not 
"my".

So this may be a case where you can solve an easy version with the chance it 
can be fooled or overengineer it. If you are allowing the user to type in what 
to search for, as many programs including editors, do, you will often find such 
false positives unless the user knows RE syntax and applies it and you do not 
escape it. I have experienced havoc when doing a careless global replace that 
matched more than I expected, including making changes in comments or constant 
strings rather than just the name of a function. Adding a paren is helpful as 
is not replacing them all but one at a time and skipping any that are not 
wanted.

Good luck.

-----Original Message-----
From: Python-list <[email protected]> On 
Behalf Of Jen Kris via Python-list
Sent: Monday, February 27, 2023 7:14 PM
To: Cameron Simpson <[email protected]>
Cc: Python List <[email protected]>
Subject: Re: How to escape strings for re.finditer?

I went to the re module because the specified string may appear more than once 
in the string (in the code I'm writing).  For example:  

a = "X - abc_degree + 1 + qq + abc_degree + 1"
 b = "abc_degree + 1"
 q = a.find(b)

print(q)
4

So it correctly finds the start of the first instance, but not the second one.  
The re code finds both instances.  If I knew that the substring occurred only 
once then the str.find would be best.  

I changed my re code after MRAB's comment, it now works.  

Thanks much.  

Jen

Feb 27, 2023, 15:56 by [email protected]:

> On 28Feb2023 00:11, Jen Kris <[email protected]> wrote:
>
>> When matching a string against a longer string, where both strings 
>> have spaces in them, we need to escape the spaces.
>>
>> This works (no spaces):
>>
>> import re
>> example = 'abcdefabcdefabcdefg'
>> find_string = "abc"
>> for match in re.finditer(find_string, example):
>>     print(match.start(), match.end())
>>
>> That gives me the start and end character positions, which is what I 
>> want.
>>
>> However, this does not work:
>>
>> import re
>> example = re.escape('X - cty_degrees + 1 + qq') find_string = 
>> re.escape('cty_degrees + 1') for match in re.finditer(find_string, 
>> example):
>>     print(match.start(), match.end())
>>
>> I’ve tried several other attempts based on my reseearch, but still no 
>> match.
>>
>
> You need to print those strings out. You're escaping the _example_ string, 
> which would make it:
>
>  X - cty_degrees \+ 1 \+ qq
>
> because `+` is a special character in regexps and so `re.escape` escapes it. 
> But you don't want to mangle the string you're searching! After all, the text 
> above does not contain the string `cty_degrees + 1`.
>
> My secondary question is: if you're escaping the thing you're searching 
> _for_, then you're effectively searching for a _fixed_ string, not a 
> pattern/regexp. So why on earth are you using regexps to do your searching?
>
> The `str` type has a `find(substring)` function. Just use that! It'll be 
> faster and the code simpler!
>
> Cheers,
> Cameron Simpson <[email protected]>
> --
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

RE: How to escape strings for re.finditer?

Reply via email to