I wrote my previous message before reading this. Thank you for the test you ran -- it answers the question of performance. You show that re.finditer is 30x faster, so that certainly recommends that over a simple loop, which introduces looping overhead.
Feb 28, 2023, 05:44 by li...@tompassin.net: > On 2/28/2023 4:33 AM, Roel Schroeven wrote: > >> Op 28/02/2023 om 3:44 schreef Thomas Passin: >> >>> On 2/27/2023 9:16 PM, avi.e.gr...@gmail.com wrote: >>> >>>> And, just for fun, since there is nothing wrong with your code, this minor >>>> change is terser: >>>> >>>>>>> example = 'X - abc_degree + 1 + qq + abc_degree + 1' >>>>>>> for match in re.finditer(re.escape('abc_degree + 1') , example): >>>>>>> >>>> ... print(match.start(), match.end()) >>>> ... >>>> ... >>>> 4 18 >>>> 26 40 >>>> >>> >>> Just for more fun :) - >>> >>> Without knowing how general your expressions will be, I think the following >>> version is very readable, certainly more readable than regexes: >>> >>> example = 'X - abc_degree + 1 + qq + abc_degree + 1' >>> KEY = 'abc_degree + 1' >>> >>> for i in range(len(example)): >>> if example[i:].startswith(KEY): >>> print(i, i + len(KEY)) >>> # prints: >>> 4 18 >>> 26 40 >>> >> I think it's often a good idea to use a standard library function instead of >> rolling your own. The issue becomes less clear-cut when the standard library >> doesn't do exactly what you need (as here, where re.finditer() uses regular >> expressions while the use case only uses simple search strings). Ideally >> there would be a str.finditer() method we could use, but in the absence of >> that I think we still need to consider using the almost-but-not-quite >> fitting re.finditer(). >> >> Two reasons: >> >> (1) I think it's clearer: the name tells us what it does (though of course >> we could solve this in a hand-written version by wrapping it in a suitably >> named function). >> >> (2) Searching for a string in another string, in a performant way, is not as >> simple as it first appears. Your version works correctly, but slowly. In >> some situations it doesn't matter, but in other cases it will. For better >> performance, string searching algorithms jump ahead either when they found a >> match or when they know for sure there isn't a match for some time (see e.g. >> the Boyer–Moore string-search algorithm). You could write such a more >> efficient algorithm, but then it becomes more complex and more error-prone. >> Using a well-tested existing function becomes quite attractive. >> > > Sure, it all depends on what the real task will be. That's why I wrote > "Without knowing how general your expressions will be". For the example > string, it's unlikely that speed will be a factor, but who knows what target > strings and keys will turn up in the future? > >> To illustrate the difference performance, I did a simple test (using the >> paragraph above is test text): >> >> import re >> import timeit >> >> def using_re_finditer(key, text): >> matches = [] >> for match in re.finditer(re.escape(key), text): >> matches.append((match.start(), match.end())) >> return matches >> >> >> def using_simple_loop(key, text): >> matches = [] >> for i in range(len(text)): >> if text[i:].startswith(key): >> matches.append((i, i + len(key))) >> return matches >> >> >> CORPUS = """Searching for a string in another string, in a performant >> way, is >> not as simple as it first appears. Your version works correctly, but >> slowly. >> In some situations it doesn't matter, but in other cases it will. For >> better >> performance, string searching algorithms jump ahead either when they >> found a >> match or when they know for sure there isn't a match for some time (see >> e.g. >> the Boyer–Moore string-search algorithm). You could write such a more >> efficient algorithm, but then it becomes more complex and more >> error-prone. >> Using a well-tested existing function becomes quite attractive.""" >> KEY = 'in' >> print('using_simple_loop:', timeit.repeat(stmt='using_simple_loop(KEY, >> CORPUS)', globals=globals(), number=1000)) >> print('using_re_finditer:', timeit.repeat(stmt='using_re_finditer(KEY, >> CORPUS)', globals=globals(), number=1000)) >> >> This does 5 runs of 1000 repetitions each, and reports the time in seconds >> for each of those runs. >> Result on my machine: >> >> using_simple_loop: [0.13952950000020792, 0.13063130000000456, >> 0.12803450000001249, 0.13186180000002423, 0.13084610000032626] >> using_re_finditer: [0.003861400000005233, 0.004061900000124297, >> 0.003478999999970256, 0.003413100000216218, 0.0037320000001273] >> >> We find that in this test re.finditer() is more than 30 times faster >> (despite the overhead of regular expressions. >> >> While speed isn't everything in programming, with such a large difference in >> performance and (to me) no real disadvantages of using re.finditer(), I >> would prefer re.finditer() over writing my own. >> > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list