RE: How to escape strings for re.finditer?

avi.e.gross Mon, 27 Feb 2023 17:59:01 -0800

Jen,

What you just described is why that tool is not the right tool for the job, 
albeit it may help you confirm if whatever method you choose does work 
correctly and finds the same number of matches.


Sometimes you simply do some searching and roll your own.

Consider this code using a sort of list comprehension feature:

>>> short = "hello world"
>>> longer =  "hello world is how many programs start for novices but some use 
>>> hello world! to show how happy they are to say hello world"

>>> short in longer
True
>>> howLong = len(short)

>>> res = [(offset, offset + howLong)  for offset  in range(len(longer)) if 
>>> longer.startswith(short, offset)]
>>> res
[(0, 11), (64, 75), (111, 122)]
>>> len(res)
3

I could do a bit more but it seems to work. Did I get the offsets right? 
Checking:

>>> print( [ longer[res[index][0]:res[index][1]] for index in range(len(res))])
['hello world', 'hello world', 'hello world']

Seems to work but thrown together quickly so can likely be done much nicer.

But as noted, the above has flaws such as matching overlaps like:

>>> short = "good good"
>>> longer = "A good good good but not douple plus good good good goody"
>>> howLong = len(short)
>>> res = [(offset, offset + howLong)  for offset  in range(len(longer)) if 
>>> longer.startswith(short, offset)]
>>> res
[(2, 11), (7, 16), (37, 46), (42, 51), (47, 56)]

It matched five times as sometimes we had three of four good in a row. Some 
other method might match only three.

What some might do can get long and you clearly want one answer and not 
tutorials. For example, people can make a loop that finds a match and either 
sabotages the area by replacing or deleting it, or keeps track and searched 
again on a substring offset from the beginning. 

When you do not find a tool, consider making one. You can take (better) code 
than I show above and make it info a function and now you have a tool. Even 
better, you can make it return whatever you want.

-----Original Message-----
From: Python-list <[email protected]> On 
Behalf Of Jen Kris via Python-list
Sent: Monday, February 27, 2023 7:40 PM
To: Bob van der Poel <[email protected]>
Cc: Python List <[email protected]>
Subject: Re: How to escape strings for re.finditer?


string.count() only tells me there are N instances of the string; it does not 
say where they begin and end, as does re.finditer.  

Feb 27, 2023, 16:20 by [email protected]:

> Would string.count() work for you then?
>
> On Mon, Feb 27, 2023 at 5:16 PM Jen Kris via Python-list <> 
> [email protected]> > wrote:
>
>>
>> I went to the re module because the specified string may appear more 
>> than once in the string (in the code I'm writing).  For example:
>>  
>>  a = "X - abc_degree + 1 + qq + abc_degree + 1"
>>   b = "abc_degree + 1"
>>   q = a.find(b)
>>  
>>  print(q)
>>  4
>>  
>>  So it correctly finds the start of the first instance, but not the 
>> second one.  The re code finds both instances.  If I knew that the substring 
>> occurred only once then the str.find would be best.
>>  
>>  I changed my re code after MRAB's comment, it now works.
>>  
>>  Thanks much.
>>  
>>  Jen
>>  
>>  
>>  Feb 27, 2023, 15:56 by >> [email protected]>> :
>>  
>>  > On 28Feb2023 00:11, Jen Kris <>> [email protected]>> > wrote:
>>  >
>>  >> When matching a string against a longer string, where both 
>> strings have spaces in them, we need to escape the spaces.  >>  >> 
>> This works (no spaces):
>>  >>
>>  >> import re
>>  >> example = 'abcdefabcdefabcdefg'
>>  >> find_string = "abc"
>>  >> for match in re.finditer(find_string, example):
>>  >>     print(match.start(), match.end())  >>  >> That gives me the 
>> start and end character positions, which is what I want.
>>  >>
>>  >> However, this does not work:
>>  >>
>>  >> import re
>>  >> example = re.escape('X - cty_degrees + 1 + qq')  >> find_string = 
>> re.escape('cty_degrees + 1')  >> for match in 
>> re.finditer(find_string, example):
>>  >>     print(match.start(), match.end())  >>  >> I’ve tried several 
>> other attempts based on my reseearch, but still no match.
>>  >>
>>  >
>>  > You need to print those strings out. You're escaping the _example_ 
>> string, which would make it:
>>  >
>>  >  X - cty_degrees \+ 1 \+ qq
>>  >
>>  > because `+` is a special character in regexps and so `re.escape` escapes 
>> it. But you don't want to mangle the string you're searching! After all, the 
>> text above does not contain the string `cty_degrees + 1`.
>>  >
>>  > My secondary question is: if you're escaping the thing you're searching 
>> _for_, then you're effectively searching for a _fixed_ string, not a 
>> pattern/regexp. So why on earth are you using regexps to do your searching?
>>  >
>>  > The `str` type has a `find(substring)` function. Just use that! It'll be 
>> faster and the code simpler!
>>  >
>>  > Cheers,
>>  > Cameron Simpson <>> [email protected]>> >  > --  > >> 
>> https://mail.python.org/mailman/listinfo/python-list
>>  >
>>  
>>  --
>>  >> https://mail.python.org/mailman/listinfo/python-list
>>
>
>
> --
> **** Listen to my CD at > http://www.mellowood.ca/music/cedars>  **** 
> Bob van der Poel ** Wynndel, British Columbia, CANADA **
> EMAIL: > [email protected]
> WWW:   > http://www.mellowood.ca
>

--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

RE: How to escape strings for re.finditer?

Reply via email to