On Wed, 24 Sep 2008, Szak�ts Viktor wrote: Hi all,
> I've built mingw with -DHB_USE_TLS (the rest is default), > then I got 'undefined reference to '__emutls_get_address' > errors on linking. Such symbol doesn't BTW exist in MinGW > 4.3.2 supplied libs. AFAIR the work on TLS support for MinGW started at the begining of summer so probably you should look at newest MinGW versions (probably devel ones if you want to use it). > Here are some Windows compiler comparison results > (speedtst / Total application time) > ================================================= > 1.1.0 (r9488) > - -DHB_NO_DEBUG [ -DHB_NO_TRACE is default ] -DHB_FM_STATISTICS_OFF > -DHB_FM_DL_ALLOC > - Harbour: -l -gc3 [I didn't mean to test with -gc3, but I had it there for > production, and forgot about it] It may has some indirect impact on CPU cache efficiency. Not big but noticeable in tests. > - MSVS switches: (all default C mode) > - GCC switches: -O3 -fomit-frame-pointer -ffast-math -march=pentiumpro > - BCC switches: -6 -OS -Ov -Oi -Oc -Q -tWM -ffast-math in some cases increases the differences in FL arithmetic so not all people like it. I also suggest to check -march=pentiumpro. This CPU has extremely slow 16bit registers and it's possible that MinGW will try to make some workarounds for it but I do not know if it really does it. Anyhow I rather suggest to chose something more closer to hardware you are using if you want to enable CPU optimization. > Compiler ST MT overhead > -------- ------ ------ -------- > MSVS2008 32.41 46.09 42% > GCC432 46.58 66.02 41% [NOTE: GCC doesn't have TLS support AFAIK] > BCC58 52.64 72.75 38% > 1.0.1 (release) > - -DHB_NO_DEBUG -DHB_NO_TRACE [ -DHB_FM_STATISTICS_OFF is default ] > - Rest of switches same as above. > Compiler ST > -------- ------ > MSVS2008 41.06 > GCC432 53.25 > BCC58 52.80 In all this tests memory manager has the most important impact so it's hard to well compare the results. I noticed in the past that Windows is extremly slow in comparision to other OSes when process has to allocate/free memory pages. It probably strictly depends on Windows version but I'm not Windows guru to say which versions works better/worse. Here the MSVC results are the best but I'm not sure it's caused by better code only. I think that OpenWatcom with its default MM will give you also such good results. If you can please check it on the same machine. > Basically I'm getting a consistent 36-42% MT overhead with all > compilers, so I wouldn't be surprised if overhead would highly > depend on exact CPU model and maybe Windows version. The cost of TLS access is strictly compiler/OS dependent. I've just make interesting experiment to compare the code of using stack pointer to dynamically allocated stack instead of statick stack address in ST programs. I made very simple modificatrion. In hbstack.c for ST mode I changed: extern HB_STACK hb_stack; to: extern PHB_STACK hb_stack_ptr; # define hb_stack ( * hb_stack_ptr ) and in estack.c: # if defined( HB_STACK_MACROS ) HB_STACK hb_stack; # else static HB_STACK hb_stack; # endif to: HB_STACK _hb_stack_; PHB_STACK hb_stack_ptr = &_hb_stack_; Then I recompiled HVM. The results are realy interesting: With original HVM I have: ST: total application time: 27.80 MT: total application time: 30.79 with modified one: ST: total application time: 32.09 MT: total application time: 30.93 The modification was only for ST mode so MT results are the same. But in ST mode it nicely shows the overhead of using pointer to stack instead of it direct address which can be optimized by compiler. In GCC build it's _bigger_ then native TLS access with enabled buffering. Without buffering the results are in practice the same. Probably buffering in local C function variable allows to enable some additional optimization because GCC does not have to worry that it will be change by some external function calls. It means that we already reach the same speed level as in ST mode if compiler has fast TLS access. In Current Linux and GCC versions using TLS does not have any noticeable speed impact. The same is in OS2. For us this is zero cost. The results are the same and the overhead ~10-20% is caused by different data structures which does not allow to enable optimizations possible for statically allocated HB_STACK structure. It also means that we will not improve the speed if we begin to pass the pointer to HB_STACK between functions as parameter. The problem is only with systems where TLS access is really expensive. F.e. older Linuxes, some *nixes, some systems without TLS at all where we will have to emulate it ourselves and looks that also in Windows though here I'm not such sure the results for compilers with native TLS support. At least few of then should give better performance. The speed difference is simply too huge. I also made some tests but I run MinGW and BCC programs in my Linux box using WINE and the overhead is from 25% to 32%. BTW. Viktor if possible then I would like to send you MinGW binaries to compare results on your system with your native MinGW builds. best regards, Przemek _______________________________________________ Harbour mailing list Harbour@harbour-project.org http://lists.harbour-project.org/mailman/listinfo/harbour