priv.onet.pl)

Przemyslaw Czerpak Tue, 16 Sep 2008 10:36:36 -0700

On Mon, 15 Sep 2008, Mindaugas Kavaliauskas wrote:

Hi Mindaugas,


>> Maybe BCC does not inline InterLocked*() functions or
>> they are not as efficient as they can be.
> Interlocked*() functions are WinAPI functions. It cannot be inlined.

I know that they are Windows API functions but
MSDN says that each compiler should try to inline them
using its own resources. In GCC such functions are declared
as extern inline. So I understand that BCC does not make it
and simply call original ones.

> The problem is that "asm { ... }" code inside .c file requires assembler 
> compiler which is not part of BCC free command line tools. Many people will 
> be unable to build Harbour binaries.

Yes this is a problem if during C compilation you cannot check if assembler
is available. In such case the only one solution is putting the code in
binary form.

> I use some hacky tricks in my code to 
> include asm into C, but it will not work on Win64, so, not suitable for 

Such code have to be protected by CPU checking so it will not be enabled
for unsupported machines.

> Harbour. Does anyone know how to include asm code into C???

If these are only few instructions then you can use __emit__ (...),
see rtl/hbtone.c

> The whole code of InterlockedIncrement() is 5 CPU instructions. So, I guess 
> we will not save a lot.

Probably yes. We can try to check it. See below.
To be precise we do not need exact interlocked behavior. We only
need protected INC in HB_ATOM_INC and protected DEC in HB_ATOM_DEC
with EQUAL flag saving.

> It was default build using make_b32.bat, i.e. with memstat. Here are 
> results without memstat:
[...]
> total application time:                   130.69  164.55

Much better though still the difference is quite huge.
You can simply check the cost of Interlocked*() operation
be redefining them inside hbthreads.h to:

   #define HB_ATOM_INC( p )    ( ++(*(p)) )
   #define HB_ATOM_DEC( p )    ( --(*(p)) )

If it will give any noticeable speed improvement in character or
array access in the above test then we can try to use their inlined
assembler version. Other wise we can leave it as is.
Looks that the most expensive is TLS access and it reduce the
performance in BCC builds - the cost of ABI in which VM pointer
is not passed to functions :-(.
We can do three things:
   1. add some tricks to reduce TLS access like HB_THREAD_STUB in
      xHarbour in hvm.c but it makes the code a little bit ugly
      though it will probably improve the MT speed about few percent.
   2. we can change used ABI so each Harbour function which may
      need HVM access will receive pointer to HB_STACK. Quite easy
      for HB_FUNC() but for internal ones it will force much more
      jobs.
   3. we can leave it as is waiting for new hardware and OS-es where
      TLS access is usually greatly improved very often by native
      hardware support.

best regards,
Przemek

ps. Do you have assembler version of InterlockedDec() function?
_______________________________________________
Harbour mailing list
Harbour@harbour-project.org
http://lists.harbour-project.org/mailman/listinfo/harbour

Re: [Harbour] 2008-09-15 13:38 UTC+0200 Przemyslaw Czerpak (druzus/at/priv.onet.pl)

Reply via email to