Re: [PHP-DEV] New function proposal: spl_object_id

Etienne Kneuss Tue, 20 Jan 2009 01:28:56 -0800

Hello,

We already had that discussion in private, but here is a on-list summary:


On Mon, Jan 19, 2009 at 5:39 PM, Guilherme Blanco
<guilhermebla...@gmail.com> wrote:
> Ok,
>
> We'll use this method inside Doctrine ORM version 2.0, scheduled to be
> released on September 1st, 2009.
>
> One main location where we are already using it is during Hydration process.
> The process of grab a DB tuple and convert it into an Object graph.
> Here is the usage.
>
> Each Object of the graph is a Value Object
> (http://en.wikipedia.org/wiki/Value_object). So it does not have any
> other mapping else than to-be persisted ones. No internal method
> implementation is needed. All Active Record like actions are
> controlled by EntityManager.
>
> Based on that, we have a ClassMetadata that is catch based on class
> name (currently based on spl_object_id, but it's too resources
> expensive and I'll change that). When we get the DB tuple, we need to
> find the exact ClassMetadata of that item and apply the specific
> DB/PHP type castings for example. Also there's a property attribution.
> Property attribution is thanks to new Reflection API. We store the
> ReflectionProperty of each field and assign it when we have its
> definition.
>
> Another location where we rely spl_object_id is inside UnitOfWork
> (http://martinfowler.com/eaaCatalog/unitOfWork.html). We generate a
> mapping of each Entity/Collection to be persisted/updated/deleted. We
> define the order of appliance of these things based on first the
> generated OID (spl_object_id return) and later by Topological Sorting
> (http://en.wikipedia.org/wiki/Topological_sorting). Finally, we start
> the transaction and the statements.
>
> The point is that we may have being doing a huge hydration with a lots
> of relationed objects. We may be dealing with a webpage that fetches
> for more than 5000 records with even more associations. All of that
> runtime. So I have to say performance is something VERY important for
> us.
>
> Why will we not use SplStorage?
> Because it'll be used on different places and should share the same
> OID. Including couple of this component is not a viable idea since
> it'll go to a more memory expensive solution, which we're trying to
> optimize a lot and also will force us to include another get call
> (through method call), which will fall into an even slower
> implementation.
>
> Here are two files that we have being using spl_object_id (changed now
> to spl_object_hash, since the idea is to update it with Marcus'
> suggestions):
> Object Driver for Hydration:
> http://trac.doctrine-project.org/browser/trunk/lib/Doctrine/ORM/Internal/Hydration/ObjectDriver.php
> UnitOfWork for Persistance:
> http://trac.doctrine-project.org/browser/trunk/lib/Doctrine/ORM/UnitOfWork.php
>
>
>
> Short version: Because we want a fast, easy way to associate
> information (temporarily) with an object. Most of the time we use the
> object id/hash as a key in an array. Basically, spl_object_hash is
> fine, it would just be nice if it could be improved in speed.
>

All those use cases are related to a [object => data] map, which can
be solved by
SplObjectStorage:

$storage = new SplObjectStorage;
$storage[$obj1] = $data; ...
var_dump($storage[$obj1]); ...

There were three concerns:
1) Speed: the main ground for spl_object_id is speed. =>
Splobjectstorage is faster than an array with spl_object_hash (and can
be made even faster).
2) $storage[$obj1]['index'] = 2; This is sadly a limitation of
ArrayAccess => It can be solved either by doing get+change+set, or
using an ArrayObject instead of an array.
3) Memory: Since the object itself will be referenced in the storage,
you'll have to delete it from every maps in order for GC to do its
work. => This is a security, indeed, an object stays unique as long as
it exists:

$a = new StdClass;
$h1 = spl_object_hash($a);
unset($a);
$b = new StdClass;
$h2 = spl_object_hash($b)
var_dump($h1===$h2); // bool(true)

Conclusion: If you clean your objects without properly taking care of
the metadata stored in the array indexed by object_id, you'll get
unexpected results anyway.

So far it looks like SplObjectStorage is fine with those use cases. If
somebody has a practical (with code) use case in which
SplObjectStorage can't be sanely used and where spl_object_id is the
only solution, please shoot.

>
>
> It'll take me some time to dig into PHP source to try to implement it.
> I'm not a C developer and there're more than 4 years I didn't touch a
> single line o C code. Also I can read PHP source, but I'm not able to
> create it.
> I already spoke with Felipe which will help me solving questions about
> src, but I cannot guarantee I'll be able to do the job.
>
>
> Regards,
>
> On Wed, Dec 17, 2008 at 7:19 PM, Marcus Boerger <he...@php.net> wrote:
>> Hello Etienne,
>>
>> Wednesday, December 17, 2008, 7:59:01 PM, you wrote:
>>
>>> Hello,
>>
>>> On Wed, Dec 17, 2008 at 7:29 PM, Lars Strojny <l...@strojny.net> wrote:
>>>> Hi Guilherme,
>>>>
>>>> thanks for moving the discussion to the list.
>>>>
>>>> Am Mittwoch, den 17.12.2008, 15:31 -0200 schrieb Guilherme Blanco:
>>>> [...]
>>>>> It seems that Marcus controls the commit access to SPL. So I'm turning
>>>>> the conversation async, since I cannot find him online at IRC.
>>>>> So, can anyone review the patch, comment it and commit if approved?
>>>>
>>>> Just for clarification, it is not about access, but about maintenance.
>>>> So if Marcus gives his go, we can happily apply the patch and add a few
>>>> tests (something you could start preparing now).
>>>>
>>>> cu, Lars
>>>>
>>
>>> Last time I checked with Marcus, there were concerns about disclosing
>>> a valid pointer to the user.
>>> I'd be happy to see a use-case where this information is really needed
>>> heavily. The only real usecase of heavy usages seems to be to
>>> implement sets of objects. but splObjectStorage is here for that
>>> precise use-case...
>>
>> Correct in all Etienne. The patch might be a tiny bit faster but exposes
>> valid pointers which is extremely bad and also allows other bad things.
>> That was the only reason I used md5 hashin. What I needed was something
>> that is really unique per object (object pointer or id plus pointer to
>> handler table). Since spl_object_hash() does not say how it creates the
>> hash it should be fine change the way it does it. Since in a new session
>> the hashes are of no more use we can even do that in any new version.
>> However I must still insist on not exposing any valid information.
>>
>> Last but not least. In your code you know the maximum length of the
>> extression, so you can allocate the string and snprintf into it. Even
>> faster is to do a hexdump into a preallocated string. For the size use:
>> char* hash = (char*)safe_emalloc(sizeof(void*), 2, 1);
>> Now the dump of the two pointers.
>> This approach should make it a bit faster for you. Something that might
>> work is to create a random 128 bit hash key that is xored onto the hash
>> created from the two pointers. This hash key can be allocated for each
>> session the first time the function will be used. If you do that I am more
>> than happy to accept that as a replacement for current spl_object_hash().
>>
>> marcus
>>
>>> Regards
>>
>>
>>> --
>>> Etienne Kneuss
>>> http://www.colder.ch
>>
>>> Men never do evil so completely and cheerfully as
>>> when they do it from a religious conviction.
>>> -- Pascal
>>
>>
>>
>>
>> Best regards,
>>  Marcus
>>
>>
>
>
>
> --
> Guilherme Blanco - Web Developer
> CBC - Certified Bindows Consultant
> Cell Phone: +55 (16) 9215-8480
> MSN: guilhermebla...@hotmail.com
> URL: http://blog.bisna.com
> São Paulo - SP/Brazil
>

Regards,

-- 
Etienne Kneuss
http://www.colder.ch

Men never do evil so completely and cheerfully as
when they do it from a religious conviction.
-- Pascal

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] New function proposal: spl_object_id

Reply via email to