Hi Anatol,

Anatol Belski wrote:

> Hi Christoph,
> 
>> -----Original Message-----
>> From: Christoph Becker [mailto:cmbecke...@gmx.de]
>> Sent: Saturday, July 25, 2015 12:09 AM
>> To: Anatol Belski <anatol....@belski.net>; 'Pierre Joye'
>> <pierre....@gmail.com>
>> Cc: 'PHP internals' <internals@lists.php.net>
>> Subject: Re: [PHP-DEV] PCRE JIT stack size limit
>>
>> Now please consider the following simple expression:
>>
>>   preg_match('/^(foo)+$/', str_repeat('foo', $n))
>>
>> This will fail (i.e. yield FALSE) independently of pcre.jit for large enough 
>> $n.
>> However, a user can change pcre.recursion_limit what will affect the $n limit
>> (the expression will fail for smaller or larger $n), if pcre.jit=0.  If 
>> pcre.jit=1 the
>> user can't influence this boundary in any way, currently.
>>
>> And maybe even worse, with pcre.jit=0 the boundary is 50,000, but with
>> pcre.jit=1 it is only 1,366.  Of course, one can argue that this is a 
>> contrived
>> example, and that such usage is crazy, but why do we have a default
>> pcre.recursion_limit of 100,000 then?  A recursion_limit of
>> 2,734 would be sufficient to have a boundary of $n == 1,366.
>
> The 100000 is an empirical value by my guess. It regards to the default stack 
> sizes on different platforms and to an average pattern. There is no 
> prediction that PCRE will not exhaust the stack available to the binary.

ACK.  However, the user is able to tune this value according to the
environment.  And yes, I'm aware that pcre.recursion_limit may be
necessary to prevent a stack overflow, what can't happen with JIT
compilation (if the JIT stack is exhausted, pcre_exec() returns
gracefully), but nonetheless giving users some control over the size of
the JIT stack seems to be appropriate (at least in the long run, aka.
PHP 7.1).

>> All in all, as this example already suggests, classic execution of matching 
>> is done
>> by recursive calls (using normal stack frames), while JIT execution of 
>> matching is
>> iterative, using a special JIT stack.[1]  I don't think it is justified to 
>> give users a
>> setting to adjust for the former, but not for the latter (except to disable 
>> JIT,
>> albeit JIT might bring quite some boost especially for such cases).
>
> JIT without custom stack will use "32k of machine stack", by the doc. We 
> currently don't really give users a choice to choose between iterative and 
> recursive PCRE execution, it's a compile time decision. 

Well, one can enforce non JIT execution by setting pcre.jit=0.
pcre.jit=1 indeed may not enforce JIT execution, depending on the
platform[1] and the libpcre version.

> And this is again because an iterative execution will be safer, but will 
> affect an average case with unnecessary thrift. In this JIT case, one could 
> give this choice, you're right, but we should evaluate it carefully. Fe what 
> happens to the PHP pattern cache if JIT stack is exhausted?

That is unrelated.  It is possible to use a single custom PCRE JIT stack
for all patterns (actually, pcre_exec() calls).  Only if multi-threading
comes into play, each thread should have its own stack (that's not
absolutely necessary, but it simplifies things a lot).[2]  Therefore
having a single PCRE_G(jit_stack) would be sufficient.  I've committed a
naive draft to my php-src fork[3].

> What I was more precisely talking about is like
> 
> If (false === preg_match(",evil pattern,", ...)) {
>       Ini_set("pcre.jit", 0);
>       // retry
> }
> 
> So received error - no JIT. And no additional logic/overhead in ext/pcre.

Yes, that is a viable way for userland (even if I would suggest to test
against pcre_last_error(); preg_match() may return FALSE for several
error conditions, such as PHP_PCRE_BAD_UTF8_ERROR, and then retrying
with pcre.jit=0 wouldn't help).  We could as well put the same logic
into ext/pcre (but I do not suggest to do so, even though I mentioned
this as option in the OP).

> Maybe custom JIT were eligible in this case, but according to the PCRE doc it 
> can easily bring issues as the global JIT memory can't be resized/migrated 
> just by one's finger click. Say one would be forced to either use the custom 
> JIT stack or default JIT stack from the start on. This can end up with the 
> over complication using a custom JIT stack for a particular pattern.

I would use the same JIT stack for all patterns (whether that is the
default JIT stack, or a custom one).  The libpcre manual states[2]:

| You may safely use the same JIT stack for more than one pattern
| (either by assigning directly or by callback), as long as the
| patterns are all matched sequentially in the same thread.

To my knowledge in ext/pcre all patterns are matched sequentially, but
I'll double-check, and I'll inquire on the libpcre mailing list.

>> As we're pretty late in the game for PHP 7.0, it might be best to postpone a 
>> new
>> ini setting or other changes to PHP 7.1, but at the very least I would 
>> introduce a
>> new error constant, say PHP_PCRE_JIT_STACKLIMIT_ERROR[2], so users get a
>> more meaningful result when calling preg_last_error() than
>> PHP_PCRE_INTERNAL_ERROR.  And it seems to be appropriate to add a note to
>> UPGRADING that pcre.jit=1 may cause some preg_*() to fail which would work
>> with pcre.jit=0.
>
> Yes, instead of returning false one could return an explicit error, or 
> indicate in any other ways.

I would not suggest to change the return value of preg_match() and
friends.  FALSE seems to be appropriate here.  Instead I suggest to add
a new error constant which would be returned from preg_last_error().

> Thanks for bringing it up and maybe we'll have more info after your further 
> investigation. But documentation is appropriate in any cases.

I'll make a respective PR after having checked back with the libpcre devs.

[1] <http://www.pcre.org/original/doc/html/pcrejit.html#SEC3>
[2] <http://www.pcre.org/original/doc/html/pcrejit.html#SEC8>
[3]
<https://github.com/cmb69/php-src/commit/78ad15d589c25b5d10aa68f5b419333f4040c16a>

-- 
Christoph M. Becker

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to