On 2012-06-08 08:18, Johannes Schlüter wrote:
On Thu, 2012-06-07 at 12:53 -0700, Adi Mutu wrote:
Ok Johannes, thanks for the answer. I'll try to look deeper.
I basically just wanted to know what happens when you concatenate two
strings? what emalloc/efree happens.
This depends. As always. As said what has to be done is one allocation
for the result value ... and then the zval magic, which depends on
refcount, references, ...
....
So, when having two constant strings there's a single malloc, in this
case allocating 7 bytes (strlen("foo")+strlen("bar")+1), if you have a
different type it has to be converted first ...
After reading the performance improvements RFC about interned strings,
and its passing mention of a "special data structure (e.g. zend_string)
instead of char*", I've been thinking a little bit about this and what
such a structure could be.
But rather than interned strings, I thought that _implicit_
concatenation would be a bigger win in the long term. Like interning, it
relies on strings being immutable.
This zend_string is a composite type. Leaves are _almost_ identical to
existing string zvals - char* val, int len - but also an additional
"child_count" field. For leaves, child_count is zero (not incidentally
indicating that it _is_ a leaf). For internal nodes, "val" is a list of
zend_strings (child_count of them). "len" still refers to the total
string length (the sum of the len fields of its children).
So a string that has been built up through concatenation is represented
by a tree (actually a dag) of zend_strings. The edges in this dag are
all properly reference-counted; discarding a string decrements the
reference counts of its children.
Only when the character data is needed for something does it need to be
allocated for and copied into one place (the internal node can then
become a leaf).
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php