Hi Christoph, > -----Original Message----- > From: Christoph Becker [mailto:cmbecke...@gmx.de] > Sent: Saturday, July 25, 2015 2:25 AM > To: Anatol Belski <anatol....@belski.net>; 'Pierre Joye' > <pierre....@gmail.com> > Cc: 'PHP internals' <internals@lists.php.net> > Subject: Re: [PHP-DEV] PCRE JIT stack size limit > > Hi Anatol, > > Anatol Belski wrote: > > > Hi Christoph, > > > >> -----Original Message----- > >> From: Christoph Becker [mailto:cmbecke...@gmx.de] > >> Sent: Saturday, July 25, 2015 12:09 AM > >> To: Anatol Belski <anatol....@belski.net>; 'Pierre Joye' > >> <pierre....@gmail.com> > >> Cc: 'PHP internals' <internals@lists.php.net> > >> Subject: Re: [PHP-DEV] PCRE JIT stack size limit > >> > >> Now please consider the following simple expression: > >> > >> preg_match('/^(foo)+$/', str_repeat('foo', $n)) > >> > >> This will fail (i.e. yield FALSE) independently of pcre.jit for large > >> enough $n. > >> However, a user can change pcre.recursion_limit what will affect the > >> $n limit (the expression will fail for smaller or larger $n), if > >> pcre.jit=0. If pcre.jit=1 the user can't influence this boundary in any > >> way, > currently. > >> > >> And maybe even worse, with pcre.jit=0 the boundary is 50,000, but > >> with > >> pcre.jit=1 it is only 1,366. Of course, one can argue that this is a > >> contrived example, and that such usage is crazy, but why do we have a > >> default pcre.recursion_limit of 100,000 then? A recursion_limit of > >> 2,734 would be sufficient to have a boundary of $n == 1,366. > > > > The 100000 is an empirical value by my guess. It regards to the default > > stack > sizes on different platforms and to an average pattern. There is no prediction > that PCRE will not exhaust the stack available to the binary. > > ACK. However, the user is able to tune this value according to the > environment. > And yes, I'm aware that pcre.recursion_limit may be necessary to prevent a > stack overflow, what can't happen with JIT compilation (if the JIT stack is > exhausted, pcre_exec() returns gracefully), but nonetheless giving users some > control over the size of the JIT stack seems to be appropriate (at least in > the > long run, aka. > PHP 7.1). Yep, I see your point about the config option. That would only work at MINIT stage. Though we also don't give many other options to the user land, like the pattern cache size. Which is IMHO fine because users would normally not need such nternal detail.
> > >> All in all, as this example already suggests, classic execution of > >> matching is done by recursive calls (using normal stack frames), > >> while JIT execution of matching is iterative, using a special JIT > >> stack.[1] I don't think it is justified to give users a setting to > >> adjust for the former, but not for the latter (except to disable JIT, > >> albeit JIT > might bring quite some boost especially for such cases). > > > > JIT without custom stack will use "32k of machine stack", by the doc. We > currently don't really give users a choice to choose between iterative and > recursive PCRE execution, it's a compile time decision. > > Well, one can enforce non JIT execution by setting pcre.jit=0. > pcre.jit=1 indeed may not enforce JIT execution, depending on the platform[1] > and the libpcre version. But we're discussing only the cases where JIT is available, that's what it is about. No need to care about it otherwise. > > > And this is again because an iterative execution will be safer, but will > > affect an > average case with unnecessary thrift. In this JIT case, one could give this > choice, > you're right, but we should evaluate it carefully. Fe what happens to the PHP > pattern cache if JIT stack is exhausted? > > That is unrelated. It is possible to use a single custom PCRE JIT stack for > all > patterns (actually, pcre_exec() calls). Only if multi-threading comes into > play, > each thread should have its own stack (that's not absolutely necessary, but it > simplifies things a lot).[2] Therefore having a single PCRE_G(jit_stack) > would be > sufficient. I've committed a naive draft to my php-src fork[3]. > If it's going to happen, of course it should be a single JIT stack for all patterns per thread. PCRE_G is required, otherwise some locking has to be implemented manually. But what I merely wanted to illustrate is exactly what you say - there are unrelated things which can go wrong but possibly affect/cause each other. > I would use the same JIT stack for all patterns (whether that is the default > JIT > stack, or a custom one). The libpcre manual states[2]: > > | You may safely use the same JIT stack for more than one pattern > | (either by assigning directly or by callback), as long as the patterns > | are all matched sequentially in the same thread. > > To my knowledge in ext/pcre all patterns are matched sequentially, but I'll > double-check, and I'll inquire on the libpcre mailing list. > Sure, just after MINIT it cannot be reassigned/resized/changed. Thus, given something like pcre.jit_stack_size INI were implemented, if it's requested under 32kb - nothing should be done, machine stack should be used. If it were over it - then custom JIT stack based on mmap/VirtualAlloc. But frankly I don't see an essential difference to the handling in the user space. Yes, it might be an advantage to still use JIT, but it moves away from the machine stack. And IMHO as it's an edge case, fixing it were fine but should not be done on the cost of much over complication and affecting normal usage. >From what is to see right now - sane usage is not affected while improved by JIT. If you're in the mood to add a note to upgrading, it'll be probably just file for now - disabling JIT retains the full PHP5 compatible behavior. > >> As we're pretty late in the game for PHP 7.0, it might be best to > >> postpone a new ini setting or other changes to PHP 7.1, but at the > >> very least I would introduce a new error constant, say > >> PHP_PCRE_JIT_STACKLIMIT_ERROR[2], so users get a more meaningful > >> result when calling preg_last_error() than PHP_PCRE_INTERNAL_ERROR. > >> And it seems to be appropriate to add a note to UPGRADING that > >> pcre.jit=1 may cause some preg_*() to fail which would work with > >> pcre.jit=0. > > > > Yes, instead of returning false one could return an explicit error, or > > indicate in > any other ways. > > I would not suggest to change the return value of preg_match() and friends. > FALSE seems to be appropriate here. Instead I suggest to add a new error > constant which would be returned from preg_last_error(). > Yep, sounds feasible. At least users can be informed about this kind of error, seems a constant is feasible for 7.0. Btw I've just noticed that the snippet I've posted earlier won't currently work. If (false === preg_match(",evil pattern,", ...)) { If (OUT_OF_JIT_STACK == pcre_last_error()) { ini_set("pcre.jit", 0); // retry } } The retry would fail even when JIT is disabled on demand. This is actually rather unexpected, maybe something in PCRE JIT state cannot be recovered once JIT compilation has failed. But actually, even without running into debugging, it is also a bad sign telling that the behavior can get complicated ( Running for the INI option and some custom JIT stack functionality needs more brainstorming and test. Please ping when you have an initial implementation, I'd be glad to help testing, etc. anyway. Regards Anatol -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php