Branch: refs/heads/yves/curlyx_curlym Home: https://github.com/Perl/perl5 Commit: dd09cdb57d10f904b31baa5499a2048198a1b58b https://github.com/Perl/perl5/commit/dd09cdb57d10f904b31baa5499a2048198a1b58b Author: Yves Orton <demer...@gmail.com> Date: 2023-03-29 (Wed, 29 Mar 2023)
Changed paths: M pod/perldelta.pod M pp_ctl.c M regexec.c M regexp.h M t/re/pat.t M t/re/pat_rt_report.t M t/re/re_tests Log Message: ----------- regcomp.c - Resolve issues clearing buffers in CURLYX (MAJOR-CHANGE) CURLYX doesn't reset capture buffers properly. It is possible for multiple buffers to be defined at once with values from different iterations of the loop, which doesn't make sense really. An example is this: "foobarfoo"=~/((foo)|(bar))+/ after this matches $1 should equal $2 and $3 should be undefined, or $1 should equal $3 and $2 should be undefined. Prior to this patch this would not be the case. The solution that this patches uses is to introduce a form of "layered transactional storage" for paren data. The existing pair of start/end data for capture data is extended with a start_new/end_new pair. When the vast majority of our code wants to check if a given capture buffer is defined they first check "start_new/end_new", if either is -1 then they fall back to whatever is in start/end. When a capture buffer is CLOSEd the data is written into the start_new/end_new pair instead of the start/end pair. When a CURLYX loop is executing and has matched something (at least one "A" in /A*B/ -- thus actually in WHILEM) it "commits" the start_new/end_new data by writing it into start/end. When we begin a new iteration of the loop we clear the start_new/end_new pairs that are contained by the loop, by setting them to -1. If the loop fails then we roll back as we used to. If the loop succeeds we continue. When we hit an END block we commit everything. Consider the example above. We start off with everything set to -1. $1 = (-1,-1):(-1,-1) $2 = (-1,-1):(-1,-1) $3 = (-1,-1):(-1,-1) In the first iteration we have matched "foo" and end up with this: $1 = (-1,-1):( 0, 3) $2 = (-1,-1):( 0, 3) $3 = (-1,-1):(-1,-1) We commit the results of $2 and $3, and then clear the new data in the beginning of the next loop: $1 = (-1,-1):( 0, 3) $2 = ( 0, 3):(-1,-1) $3 = (-1,-1):(-1,-1) We then match "bar": $1 = (-1,-1):( 0, 3) $2 = ( 0, 3):(-1,-1) $3 = (-1,-1):( 3, 7) and then commit the result and clear the new data: $1 = (-1,-1):( 0, 3) $2 = (-1,-1):(-1,-1) $3 = ( 3, 7):(-1,-1) and then we match "foo" again: $1 = (-1,-1):( 0, 3) $2 = (-1,-1):( 7,10) $3 = ( 3, 7):(-1,-1) And we then commit. We do a regcppush here as normal. $1 = (-1,-1):( 0, 3) $2 = ( 7,10):( 7,10) $3 = (-1,-1):(-1,-1) We then clear it again, but since we don't match when we regcppop we store the buffers back to the above layout. When we finally hit the END buffer we also do a commit as well on all buffers, including the 0th (for the full match). Fixes GH Issue #18865, and adds tests for it and other things.