On Wed, Mar 20, 2019 at 4:35 PM C. Scott Ananian <canan...@wikimedia.org>
wrote:

> On Tue, Mar 19, 2019 at 10:58 AM Nikita Popov <nikita....@gmail.com>
> wrote:
>
>> After thinking about this some more, while this may be a minor
>> performance improvement, it still does more work than necessary. In
>> particular the use of OFFSET_CAPTURE (which would be pretty much required
>> here) needs one new two-element array for each subpattern. If the captured
>> strings are short, this is where the main cost is going to be.
>>
>
> The primary use of this feature is when the captured strings are *long*,
> as that's when we most want to avoid copying a substring.
>
>
>> I'm wondering if we shouldn't consider a new object oriented API for PCRE
>> which can return a match object where subpattern positions and contents can
>> be queried via method calls, so you only pay for the parts that you do
>> access.
>>
>
> Seems like this is letting the perfect be the enemy of the good.  The
> LENGTH_CAPTURE significantly reduces allocation for long match strings, and
> it allocates the same two-element arrays that OFFSET_CAPTURE would -- it
> just stores an integer where there would otherwise be an expensive
> substring.  Furthermore, since the array structure is left mostly alone, it
> would be not-too-hard to support earlier-PHP versions, with something like:
>
> $hasLengthCapture = defined('PREG_LENGTH_CAPTURE') ? PREG_LENGTH_CAPTURE :
> 0;
> $r = preg_match($pat, $sub, $m, PREG_OFFSET_CAPTURE | $hasLengthCapture);
> $matchOneLength = $hasLengthCapture ? $m[1][0] : strlen($m[1][0]);
> $matchOneOffset = $m[1][1];
>
> If you introduce a whole new OO accessor object, it starts becoming very
> hard to write backward-compatible code.
>  --scott
>

Fair enough. I've created https://github.com/php/php-src/pull/3971 to
implement this feature. It would be good to have some confirmation that
this is really a significant performance improvement before we land it
though.

Nikita

Reply via email to