On Wed, 24 Sep 2008, Szak�ts Viktor wrote:

Hi all,

> I've built mingw with -DHB_USE_TLS (the rest is default),
> then I got 'undefined reference to '__emutls_get_address'
> errors on linking. Such symbol doesn't BTW exist in MinGW
> 4.3.2 supplied libs.

AFAIR the work on TLS support for MinGW started at the begining of
summer so probably you should look at newest MinGW versions (probably
devel ones if you want to use it).

> Here are some Windows compiler comparison results
> (speedtst / Total application time)
> =================================================
> 1.1.0 (r9488)
> - -DHB_NO_DEBUG [ -DHB_NO_TRACE is default ] -DHB_FM_STATISTICS_OFF 
> -DHB_FM_DL_ALLOC
> - Harbour: -l -gc3 [I didn't mean to test with -gc3, but I had it there for 
> production, and forgot about it]

It may has some indirect impact on CPU cache efficiency. Not big but
noticeable in tests.

> - MSVS switches: (all default C mode)
> - GCC switches: -O3 -fomit-frame-pointer -ffast-math -march=pentiumpro
> - BCC switches: -6 -OS -Ov -Oi -Oc -Q -tWM

-ffast-math in some cases increases the differences in FL arithmetic
so not all people like it.
I also suggest to check -march=pentiumpro. This CPU has extremely slow
16bit registers and it's possible that MinGW will try to make some
workarounds for it but I do not know if it really does it. Anyhow
I rather suggest to chose something more closer to hardware you are
using if you want to enable CPU optimization.

> Compiler     ST     MT overhead
> -------- ------ ------ --------
> MSVS2008  32.41  46.09      42%
> GCC432    46.58  66.02      41% [NOTE: GCC doesn't have TLS support AFAIK]
> BCC58     52.64  72.75      38%
> 1.0.1 (release)
> - -DHB_NO_DEBUG -DHB_NO_TRACE [ -DHB_FM_STATISTICS_OFF is default ]
> - Rest of switches same as above.
> Compiler     ST
> -------- ------
> MSVS2008  41.06
> GCC432    53.25
> BCC58     52.80

In all this tests memory manager has the most important impact
so it's hard to well compare the results. I noticed in the past
that Windows is extremly slow in comparision to other OSes when
process has to allocate/free memory pages. It probably strictly
depends on Windows version but I'm not Windows guru to say which
versions works better/worse.
Here the MSVC results are the best but I'm not sure it's caused
by better code only. I think that OpenWatcom with its default MM
will give you also such good results. If you can please check it
on the same machine.

> Basically I'm getting a consistent 36-42% MT overhead with all
> compilers, so I wouldn't be surprised if overhead would highly
> depend on exact CPU model and maybe Windows version.

The cost of TLS access is strictly compiler/OS dependent. I've
just make interesting experiment to compare the code of using
stack pointer to dynamically allocated stack instead of statick
stack address in ST programs.
I made very simple modificatrion. In hbstack.c for ST mode I changed:

      extern HB_STACK hb_stack;

to:

      extern PHB_STACK hb_stack_ptr;
#     define hb_stack      ( * hb_stack_ptr )

and in estack.c:
   #  if defined( HB_STACK_MACROS )
         HB_STACK hb_stack;
   #  else
         static HB_STACK hb_stack;
   #  endif

to:
      HB_STACK _hb_stack_;
      PHB_STACK hb_stack_ptr = &_hb_stack_;

Then I recompiled HVM. The results are realy interesting:
With original HVM I have:
   ST: total application time:                              27.80
   MT: total application time:                              30.79
with modified one:
   ST: total application time:                              32.09
   MT: total application time:                              30.93

The modification was only for ST mode so MT results are the same.
But in ST mode it nicely shows the overhead of using pointer to stack
instead of it direct address which can be optimized by compiler.
In GCC build it's _bigger_ then native TLS access with enabled buffering.
Without buffering the results are in practice the same. Probably buffering
in local C function variable allows to enable some additional optimization
because GCC does not have to worry that it will be change by some external
function calls.
It means that we already reach the same speed level as in ST mode if
compiler has fast TLS access. In Current Linux and GCC versions using
TLS does not have any noticeable speed impact. The same is in OS2. For
us this is zero cost.
The results are the same and the overhead ~10-20% is caused by different
data structures which does not allow to enable optimizations possible for
statically allocated HB_STACK structure.
It also means that we will not improve the speed if we begin to pass
the pointer to HB_STACK between functions as parameter.
The problem is only with systems where TLS access is really expensive.
F.e. older Linuxes, some *nixes, some systems without TLS at all where
we will have to emulate it ourselves and looks that also in Windows
though here I'm not such sure the results for compilers with native
TLS support. At least few of then should give better performance. The
speed difference is simply too huge. I also made some tests but I run
MinGW and BCC programs in my Linux box using WINE and the overhead is
from 25% to 32%.
BTW. Viktor if possible then I would like to send you MinGW binaries
to compare results on your system with your native MinGW builds.

best regards,
Przemek
_______________________________________________
Harbour mailing list
Harbour@harbour-project.org
http://lists.harbour-project.org/mailman/listinfo/harbour

Reply via email to