Ma Lin <malin...@163.com> added the comment: > Could you please create and run some microbenchmarks to measure > possible performance penalty of additional MARH_PUSHes? I am > especially interesting in worst cases.
Besides the worst case, I prepared two solutions. Solution_A (PR12288): Fix the bugs, I can find test-case for every changes. I feel JUMP_MIN_UNTIL_3 should MARK_PUSH() as well, but I really can't find a test-case to prove this feel. Commit_12 is a safety-check, if JUMP_MIN_UNTIL_3 or other JUMPs should be protected, maybe there will be a bug report come from millions of users. Solution_B (PR12289): Based on Solution_A, protects JUMP_MIN_UNTIL_3. Worst (PR12290): Based on Solution_B, protects in everywhere, this should the worst performance. Legend of this table: * No: no protection * L : LASTMARK_SAVE() * Mu: MARK_PUSH() unconditionally * Mr: MARK_PUSH() if in a repeat unpatched Solution_A Solution_B Worst JUMP_MAX_UNTIL_1 No No No -> L/Mu JUMP_MAX_UNTIL_2 L/Mu L/Mu L/Mu L/Mu JUMP_MAX_UNTIL_3 No No No -> L/Mu JUMP_MIN_UNTIL_1 No No No -> L/Mu JUMP_MIN_UNTIL_2 L -> L/Mr L/Mr -> L/Mu JUMP_MIN_UNTIL_3 No No -> L/Mu -> L/Mu JUMP_REPEAT_ONE_1 L -> L/Mr L/Mr -> L/Mu JUMP_REPEAT_ONE_2 L -> L/Mr L/Mr -> L/Mu JUMP_MIN_REPEAT_ONE L -> L/Mr L/Mr -> L/Mu JUMP_BRANCH L/Mr L/Mr L/Mr -> L/Mu JUMP_ASSERT No No No -> L/Mu JUMP_ASSERT_NOT No -> L/Mr L/Mr -> L/Mu 🔶 I made a benchmark tool for Python RE engines. https://github.com/animalize/re_benchmarks re100mb.py was extracted from a real case, process 100MB real data in 16 rounds (with 16 patterns). This table picks best of 4 results from each round: unpatched A B Worst regex 2019.3.9 test 01 best: 0.647 0.629 0.625 0.635 0.102 test 02 best: 3.980 4.046 4.052 4.005 4.530 test 03 best: 0.730 0.708 0.709 0.706 0.433 test 04 best: 0.763 0.743 0.740 0.736 0.799 test 05 best: 2.566 2.693 2.692 2.654 0.981 test 06 best: 0.883 0.865 0.859 0.874 0.872 test 07 best: 2.847 2.773 2.797 2.890 1.202 test 08 best: 3.644 3.664 3.740 3.699 1.139 test 09 best: 0.344 0.348 0.343 0.345 0.378 test 10 best: 0.341 0.347 0.343 0.344 0.407 test 11 best: 4.490 4.655 4.520 4.440 1.340 test 12 best: 0.264 0.263 0.262 0.264 0.271 test 13 best: 0.230 0.230 0.231 0.233 0.281 test 14 best: 0.932 0.925 0.919 0.943 0.334 test 15 best: 1.837 1.815 1.804 1.866 0.683 test 16 best: 1.691 1.708 1.676 2.042 3.805 -------------------------------------------------------- sum of above: 26.189 26.412 26.312 26.676 17.557 There seems no significant changes. 🔶 Below are some micro benchmarks, run t.py with the benchmark tool testit.py SRE_OP_MAX_UNTIL a few repeats unpatched: 744.85 nsec SolutionA: 755.86 nsec SolutionB: 745.14 nsec Worst: 843.56 nsec regex: 2.45 usec SRE_OP_MAX_UNTIL many repeats unpatched: 393.24 msec SolutionA: 321.16 msec SolutionB: 323.24 msec Worst: 498.48 msec regex: 240.73 msec ------------------ SRE_OP_MIN_UNTIL a few repeats unpatched: 702.75 nsec SolutionA: 701.90 nsec SolutionB: 730.81 nsec Worst: 873.67 nsec regex: 1.84 usec SRE_OP_MIN_UNTIL many repeats unpatched: 210.60 msec SolutionA: 174.20 msec SolutionB: 323.93 msec Worst: 491.73 msec regex: 195.11 msec ------------------ SRE_OP_REPEAT_ONE short string unpatched: 466.56 nsec SolutionA: 468.85 nsec SolutionB: 466.04 nsec Worst: 463.86 nsec regex: 1.13 usec SRE_OP_REPEAT_ONE long string unpatched: 2.19 msec SolutionA: 2.19 msec SolutionB: 2.19 msec Worst: 2.19 msec regex: 3.23 msec ------------------ SRE_OP_MIN_REPEAT_ONE short string unpatched: 563.76 nsec SolutionA: 566.68 nsec SolutionB: 561.92 nsec Worst: 601.69 nsec regex: 1.12 usec SRE_OP_MIN_REPEAT_ONE long string unpatched: 11.16 msec SolutionA: 11.27 msec SolutionB: 10.55 msec Worst: 15.97 msec regex: 7.24 msec ------------------ SRE_OP_BRANCH unpatched: 419.34 nsec SolutionA: 422.07 nsec SolutionB: 418.25 nsec Worst: 423.56 nsec regex: 1.34 usec ------------------ SRE_OP_ASSERT unpatched: 320.58 nsec SolutionA: 326.29 nsec SolutionB: 320.81 nsec Worst: 333.22 nsec regex: 1.14 usec SRE_OP_ASSERT_NOT unpatched: 440.18 nsec SolutionA: 440.93 nsec SolutionB: 434.44 nsec Worst: 446.33 nsec regex: 1.00 usec 🔶 reduce the size of match_context struct In stack-consuming scenes, Solution_A and Solution_B are better than unpatched. This is because commit 10 and commit 11, they reduced the size of match_context struct, the stack uses this struct to simulate recursive call. On 32 bit platform, sizeof(match_context): 36 bytes -> 32 bytes. On 64 bit platform, sizeof(match_context): 72 bytes -> 56 bytes. It brings: - deeper recursive call - less memory consume - less memory realloc If limit the stack size to 1GB, the max value of n is: re.match(r'(ab)*', n * 'ab') # need to save MARKs 72 bytes: n = 11,184,809 64 bytes: n = 12,201,610 56 bytes: n = 13,421,771 re.match(r'(?:ab)*', n * 'ab') # no need to save MARKs 72 bytes: n = 13,421,770 64 bytes: n = 14,913,079 56 bytes: n = 16,777,214 🔶 Future optimizations > If the penalty is significant, it will be a goal of future optimizations. Maybe these two questions can be predicted when compiling the pattern: - whether protect or not - which MARKs should be protected Then sre don't need to MARK_PUSH() all available MARKs to stack. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue35859> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com