Hi,

I'm writing a conference paper about phc, and I discuss the PHP C API
in some detail. I've included the text I intend to include in my paper
below. I wonder if any of the experts on the topic see any flaws in my
account. All the details come from Sarah Golemon's book, my experience
with the PHP embed SAPI, study of the source code, and following
discussions on this list.


<quote>
The primitive unit of data in PHP is the zval, a small structure encompassing a
union of values---objects, hashtables, strings and numeric types---and
memory-management counters and flags. A PHP variable is a symbol-table entry
pointing to a zval, and multiple variables can point to the same zval, using
reference counting for memory management.

Objects in PHP are copied by reference. Assignment of primitive values,
however, is by copy, meaning that semantically the l-value becomes a copy of
the r-value. As an optimization, the PHP run-time causes the l-value to share
the r-value's zval, increasing the reference count, and the variables become
part of the same copy-on-write set. Assignment can also be by reference, which
puts the two variables in the same change-on-write set, in a similar fashion.
This sets the is_ref flag of the shared zval, indicating that the variables in
this set all reference each other. Updating a variable which is a reference
updates its zval, changing the value of all the other variables in that
change-on-write set.

Variables in a copy-on-write set share the same zval, but are not semantically
related. Although this is an optimization applied by the PHP run-time, it is a
feature which phc must deal with to interact with the run-time, and so it
reuses it for performance. In order to update the value of a variable in a
copy-on-write set, it must first be separated. A copy of its zval is created---a
deep copy in the case of arrays and strings---and the original zval has its
reference count decremented.  zvals with a reference count of zero are
deallocated.

Variables in a change-on-write set must similarly be separated if they are
assigned to a copy-on-write set. Otherwise, assignment to a variable overwrites
a zval---s value field, changing the value of all the variables in that
change-on-write set. Variables with a reference count of one, which are in
neither a copy-on-write or change-on-write set---are similarly treated.

The PHP interpreter keeps a reference to a variable's zval in global and
function-local symbol-tables---hashtables indexed by the variable's name. When a
function finishes execution, the local symbol-table is destroyed, decreasing
the reference count of all zvals contained within. The global symbol table is
destroyed at the end of the execution of a script.

As a result of the function-local symbol-table, each PHP variable uses a great
amount of space. The zval itself is 16 bytes long. However, the symbol-table is
a hashtable with a 36 byte bucket. Combined with memory allocation overhead,
each variable occupies 68 bytes on a 32-bit platform [?], and nearly double
that on a 64-bit platform. This means that variable allocations, copies,
separations and deallocations are quite expensive---the PHP interpreter spends
over 20% of its time in memory-management, according to our profiling, which
does not include time spent incrementing and decrementing reference counts. As
a result of the re-use of the PHP run-time, phc is afflicted with the same
problem.
</quote>


Thanks in advance,
Paul

-- 
Paul Biggar
[EMAIL PROTECTED]

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to