Hi, On Thursday 21 August 2008 09:37:12 Dmitry Stogov wrote: > > Arnaud Le Blanc wrote: > > Hi, > > > > On Tuesday 19 August 2008 09:22:46 Dmitry Stogov wrote: > >> Hi Arnaud, > >> > >> Arnaud Le Blanc wrote: > >>> Hi, > >>> > >>> On Monday 18 August 2008 19:46:46 Dmitry Stogov wrote: > >>>> Hi Arnaud, > >>>> > >>>> The patch looks very interesting. > >>>> I think it may be committed to the HEAD in the nearest future. > >>>> I don't have time to look into all details in the moment. > >>>> > >>>> Could you explain why --with-tsrm-full-__thread-tls doesn't work with > >>>> dlopen() however --with-tsrm-__thread-tls does? > >>> That's due to the way TLS works internally. Actually I need further > > reading on > >>> that. > >> I don't see a big difference between --with-tsrm-full-__thread-tls and > >> --with-tsrm-__thread-tls from TLS point of view (may be I miss it), so I > >> don't understand why one works and the other doesn't. > > > > Badly both was not expected to work with dlopen() :( > > http://marc.info/?l=php-internals&m=121912220705964&w=2 > > > > It works when using an other TLS model (which requires PIC). > > This model is less efficient but it still improves performance. PIC-patched is > > faster than non-PIC-unpatched, and the improvement is even greater against > > PIC-unpatched. > > > > This is the thing I was afraid. > > >> The patch looks more difficult for me than it should. I would prefer to > >> have only --with-tsrm-full-__thread-tls, if it works, as the patch would > >> be simple and PHP faster. > >> > >> Another simple solution, which you probably already tested, is to use > >> only global __thread-ed tsrm_ls (and don't pass/fetch it), however, > >> access thread-globals in the same way: > >> ((*type)tsrm_ls[global_module_id])->global_fileld > > > > Yes, I tested and it given 4.7s on bench.php (vs 3.8s with the current patch). > > > > Does this model have the same issues with dlopen()? > (We have only one __thread-ed variable).
Yes and no. By reading the loader/linker code in the glibc I seen that it reserves a number of memory especially for the dlopen() case. This number of memory it too small for --with-tsrm-full-__thread-tls, but several times larger than needed for --with-tsrm-__thread-tls. So the memory found by dlopen() is not random and I think we can expect that to work on Linux. However things are different on Windows and FreeBSD. I will post a RFC about that to clarify things. > > Thanks. Dmitry. > > >> Anyway you did a great job. I would like to see this idea implemented in > >> HEAD. It is little bit late for 5.3 :( > >> > >> Thanks. Dmitry. > >> > >>>> Did you test the patch with DSO extensions? > >>> I will, but I guess that will be behaves like another shared library > >>> dlopen()ed by Apache. > >>> > >>>> It would be interesting to try the same idea on Windows with VC. > >>> I will try too. > >>> > >>>> Thanks. Dmitry. > >>>> > >>>> Arnaud Le Blanc wrote: > >>>>> Hi, > >>>>> > >>>>> Currently the way globals work forces to pass a thread-local-storage > >>> pointer > >>>>> across function calls, which involves some overhead. Also, not all > >>> functions > >>>>> get the pointer as argument and need to use TSRMLS_FETCH(), which is > > slow. > >>> For > >>>>> instance emalloc() involves a TSRMLS_FETCH(). An other overhead is > >>> accessing > >>>>> globals, using multiple pointers in different locations. > >>>>> > >>>>> The following patch caches each global address in a native TLS variable > > so > >>>>> that accessing a global is as simple as global_name->member. This > > removes > >>> the > >>>>> requirement of passing the tls pointer across function calls, so that > > the > >>> two > >>>>> major overheads of ZTS builds are avoided. > >>>>> > >>>>> Globals can optionally be declared statically, which speeds up things a > >>> bit. > >>>>> Results in bench.php: > >>>>> non-ZTS: 3.7s > >>>>> ZTS unpatched: 5.2s > >>>>> ZTS patched: 4.0s > >>>>> ZTS patched and static globals: 3.8s > >>>>> > >>>>> The patch introduces two new macros: TSRMG_D() (declare) and TSRMG_DH() > >>>>> (declare, for headers) to declare globals, instead of the current > >>> "ts_rsrc_id > >>>>> foo_global_id". These macros declare the global id, plus the __thread > >>> pointer > >>>>> to the global storage. > >>>>> > >>>>> ts_allocate_id now takes one more callback function as argument to bind > >>> the > >>>>> global pointer to its storage. This callback is declared in TSRMG_D[H] > > (). > >>>>> As all TSRMLS_* macros now does nothing, it is needed to call > >>> ts_resource(0) > >>>>> explicitly at least one time in each thread to initialize its storage. A > >>> new > >>>>> TSRMLS_INIT() macro as been added for this purpose. > >>>>> > >>>>> All this is disabled by default. --with-tsrm-__thread-tls enables the > >>> features > >>>>> of the patch, and --with-tsrm-full-__thread-tls enables static > > declaration > >>> of > >>>>> globals. > >>>>> > >>>>> It as been tested on Linux compiled with --disable-all in CLI and a bit > > in > >>>>> Apache2 with the worker MPM. Known issues: > >>>>> - Declaring globals statically (--with-tsrm-full-__thread-tls) causes > >>> troubles > >>>>> to dlopen(), actually Apache wont load the module at runtime (it works > >>> with > >>>>> just --with-tsrm-__thread-tls). > >>>>> - The patch assumes that all resources are ts_allocate_id()'ed before > > any > >>>>> other thread calls ts_allocate_id or ts_resource_ex(), which is possibly > >>> not > >>>>> the case. > >>>>> > >>>>> The patch needs some tweaks and does not pretend to be included in any > >>> branch, > >>>>> but I would like to have some comments on it. > >>>>> > >>>>> The patch: http://arnaud.lb.s3.amazonaws.com/__thread-tls.patch > >>>>> > >>>>> Regards, > >>>>> > >>>>> Arnaud > >>>>> > >>>>> > >>> Regards, > >>> > >>> Arnaud > > > > Regards, > > > > Arnaud > -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php