Andi,


On Tue, Apr 7, 2015 at 8:52 PM, Andi Gutmans <a...@zend.com> wrote:
> On Fri, Apr 3, 2015 at 11:57 AM, Anthony Ferrara <ircmax...@gmail.com>
> wrote:
>>
>> All,
>>
>> I spent a little bit of time today trying to debug an issue with 7
>> that Drupal 8 was facing, specifically regarding an array index not
>> behaving correctly ($array["key"] returned null, even though the key
>> existed in the hash table).
>>
>> I noticed that the hash table implementation has gotten orders of
>> magnitude more complex in recent times (since phpng was merged).
>>
>> Specifically, that ardata and arhash are now the same block of memory,
>> and that we're now doing negative indexing into arData to get the hash
>> map list. From Dmitry's commit message, it was done to keep the data
>> that's accessed most often in the same CPU cache line. While I am sure
>> that there are definitive performance gains to doing this, I do worry
>> about the development and debugging costs of this added complexity.
>>
>> As well as the way it increases the busfactor of the project.
>>
>> There is definitely a tradeoff there, as the change is pretty well
>> encapsulated behind macros. But that introduces a new level of
>> abstraction. But deeper than that it really makes debugging with gdb a
>> pain in the neck.
>>
>> Without hard data on this particular patch, I'm not suggesting we roll
>> back the change or anything. I more just want to express concern with
>> the trend lately to increase complexity significantly on developers
>> for the sake of performance.
>>
>> While I'm definitely not saying performance doesn't matter, I also
>> think performance at all costs is dangerous. And I wonder if some of
>> the more fundamental (even if isolated) changes such as this should be
>> way more documented and include the performance justification for
>> them. I'm definitely not suggesting an RFC, but perhaps some level of
>> discussion should be required for these sorts of changes...
>>
>> Thoughts?
>
>
> I think it is generally true that increased performance often requires more
> sophisticated approaches.
> Generally speaking I've observed that the faster, more modern runtime
> engines all need to deal with that additional sophistication.
> JIT runtime engines typically are the worst because they deal with hundreds
> of micro-optimizations around code generation (register allocation, cache
> line optimization, etc...).
> So what you have in PHP 7 today is actually not "that" bad compared to some
> of the other runtimes (IMO).
> I think it can be partially addressed in a combination of documenting key
> datastructures (some of which was already written) and maybe some additional
> comments in areas of code where the complexity level goes up for some very
> specific "tricks".
>
> You can see by the level of interest in performance (whether ones opinion is
> that this is fully warranted or not) around PHP 7, HHVM and other languages,
> that this is an area we need to invest in on an ongoing basis. And
> sophistication will likely go up.

Thanks for the reply. I'm not really saying everything needs to be
dead simple. Most of the issues I'm more talking about could be solved
through communication, documentation, tooling and refactoring. But
some I do question at a more fundamental level. The hash table is one
of them.

If we were using a pure abstraction (only accessing the hash table
information through the public API), then fine because it's isolated.
However, many extensions and even places in core access hash table
structure directly (as can be seen by the updates needed by
https://github.com/php/php-src/commit/2b42d719084631d255ec7ebb6c2928b9339915c2).
Meaning the complexity isn't encapsulated.

Sophistication is fine. What worries me though is magic. What worries
me is the growing inability to debug with normal tools. Perhaps we
need a GDB extension to provide tooling for common debugging tasks.
Heck, even dumping a zend_string requires a cast (p (char*)str->val).

I am all for the performance improvements. I just don't think "at all
costs" is a viable model (nor do I think that's what people are
doing). I just think it's worth discussing (and hopefully mitigating)
the costs of them explicitly. At least for the more significant ones.

Anthony

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to