On Tue, Mar 14, 2017 at 11:29 AM, David Zuelke <d...@heroku.com> wrote:
> Hi all, > > There appears to be a performance regression in the CFG and DFA based > optimization passes of OPcache in PHP 5.6+ when loading huge classes (such > as those generated by Symfony's routing component) for the first time. > > The issue does not occur on PHP 5.5. Tests below are with 5.6.30, 7.0.16 > and 7.1.2 and default INI settings; I replicated the issue on both macOS > and Linux. > > Test file here (it's from an actual application, slightly anonymized, not > a synthetic example): https://gist.github.com/dzuelk > e/fe867f55f09e0bf79ecefcc815b7fe92 > > Without OPcache, everything is fine in all versions: > > $ time -p php -dopcache.enable_cli=0 hugeclass.php > real 0.10 > user 0.09 > sys 0.00 > > With OPcache on, things are suddenly much, much slower: > > 5.6: > > $ time -p php -dopcache.enable_cli=1 hugeclass.php > real 3.23 > user 3.21 > sys 0.02 > > 7.0: > > $ time -p php -dopcache.enable_cli=1 hugeclass.php > real 1.76 > user 1.73 > sys 0.02 > > 7.1: > > $ time -p php -dopcache.enable_cli=1 hugeclass.php > real 4.01 > user 3.98 > sys 0.02 > > For comparison, 5.5 is as speedy as you'd expect it to be: > > $ time -p php -dopcache.enable_cli=1 hugeclass.php > real 0.14 > user 0.11 > sys 0.02 > > If we switch off optimization passes 5 (CFG based) and 6 (DFA based, only > in 7.1), everything is great again in all versions: > > $ time -p php -dopcache.enable_cli=1 -dopcache.optimization_level=0x7FFFFFCF > hugeclass.php > real 0.13 > user 0.10 > sys 0.02 > > For 5.6 and 7.0, pass 6 is not a thing, but in 7.1, we can inspect passes > 5 and 6 separately. > > Pass 5 (CFG based) already makes for a drastic performance hit in 7.1: > > $ time -p php -dopcache.enable_cli=1 -dopcache.optimization_level=0x7FFFFFDF > hugeclass.php > real 0.88 > user 0.86 > sys 0.01 > > But pass 6 (DFA based) is the one that causes the biggest slowdown in 7.1: > > $ time -p php -dopcache.enable_cli=1 -dopcache.optimization_level=0x7FFFFFEF > hugeclass.php > real 3.29 > user 3.24 > sys 0.04 > > In all versions, subsequent loads from the cache (such as when running FPM > or the built-in web server) are fast. > > Is this slowness with a cold cache expected/accepted, or does that qualify > as a bug? > Yes, this is a bug. Optimization should never be this slow. >From a quick perf run, the problem in the DFA pass is that TI is quadratic in the number of calls (as call lookup is implemented as a linear scan). This can be fixed with a more efficient map lookup. The problem in the CFG pass is that a large buffer is zeroed repeatedly. Nikita