I think by now we have given all that is needed by the OP but Dave's answer strikes me as being able to be a tad faster as a while loop if you are searching larger corpus such as an entire ebook or all books as you can do on books.google.com
I think I mentioned earlier that some assumptions need to apply. The text needs to be something like an ASCII encoding or seen as code points rather than bytes. We assume a match should move forward by the length of the match. And, clearly, there cannot be a match too close to the end. So a while loop would begin with a variable set to zero to mark the current location of the search. The condition for repeating the loop is that this variable is less than or equal to len(searched_text) - len(key) In the loop, each comparison is done the same way as David uses, or anything similar enough but the twist is a failure increments the variable by 1 while success increments by len(key). Will this make much difference? It might as the simpler algorithm counts overlapping matches and wastes some time hunting where perhaps it shouldn't. And, of course, if you made something like this into a search function, you can easily add features such as asking that you only return the first N matches or the next N, simply by making it a generator. So tying this into an earlier discussion, do you want the LAST match info visible when the While loop has completed? If it was available, it opens up possibilities for running the loop again but starting from where you left off. -----Original Message----- From: Python-list <python-list-bounces+avi.e.gross=gmail....@python.org> On Behalf Of Thomas Passin Sent: Monday, February 27, 2023 9:44 PM To: python-list@python.org Subject: Re: How to escape strings for re.finditer? On 2/27/2023 9:16 PM, avi.e.gr...@gmail.com wrote: > And, just for fun, since there is nothing wrong with your code, this minor change is terser: > >>>> example = 'X - abc_degree + 1 + qq + abc_degree + 1' >>>> for match in re.finditer(re.escape('abc_degree + 1') , example): > ... print(match.start(), match.end()) > ... > ... > 4 18 > 26 40 Just for more fun :) - Without knowing how general your expressions will be, I think the following version is very readable, certainly more readable than regexes: example = 'X - abc_degree + 1 + qq + abc_degree + 1' KEY = 'abc_degree + 1' for i in range(len(example)): if example[i:].startswith(KEY): print(i, i + len(KEY)) # prints: 4 18 26 40 If you may have variable numbers of spaces around the symbols, OTOH, the whole situation changes and then regexes would almost certainly be the best approach. But the regular expression strings would become harder to read. -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list