On Mon, 15 Sep 2008, Mindaugas Kavaliauskas wrote: Hi Mindaugas,
>> Maybe BCC does not inline InterLocked*() functions or >> they are not as efficient as they can be. > Interlocked*() functions are WinAPI functions. It cannot be inlined. I know that they are Windows API functions but MSDN says that each compiler should try to inline them using its own resources. In GCC such functions are declared as extern inline. So I understand that BCC does not make it and simply call original ones. > The problem is that "asm { ... }" code inside .c file requires assembler > compiler which is not part of BCC free command line tools. Many people will > be unable to build Harbour binaries. Yes this is a problem if during C compilation you cannot check if assembler is available. In such case the only one solution is putting the code in binary form. > I use some hacky tricks in my code to > include asm into C, but it will not work on Win64, so, not suitable for Such code have to be protected by CPU checking so it will not be enabled for unsupported machines. > Harbour. Does anyone know how to include asm code into C??? If these are only few instructions then you can use __emit__ (...), see rtl/hbtone.c > The whole code of InterlockedIncrement() is 5 CPU instructions. So, I guess > we will not save a lot. Probably yes. We can try to check it. See below. To be precise we do not need exact interlocked behavior. We only need protected INC in HB_ATOM_INC and protected DEC in HB_ATOM_DEC with EQUAL flag saving. > It was default build using make_b32.bat, i.e. with memstat. Here are > results without memstat: [...] > total application time: 130.69 164.55 Much better though still the difference is quite huge. You can simply check the cost of Interlocked*() operation be redefining them inside hbthreads.h to: #define HB_ATOM_INC( p ) ( ++(*(p)) ) #define HB_ATOM_DEC( p ) ( --(*(p)) ) If it will give any noticeable speed improvement in character or array access in the above test then we can try to use their inlined assembler version. Other wise we can leave it as is. Looks that the most expensive is TLS access and it reduce the performance in BCC builds - the cost of ABI in which VM pointer is not passed to functions :-(. We can do three things: 1. add some tricks to reduce TLS access like HB_THREAD_STUB in xHarbour in hvm.c but it makes the code a little bit ugly though it will probably improve the MT speed about few percent. 2. we can change used ABI so each Harbour function which may need HVM access will receive pointer to HB_STACK. Quite easy for HB_FUNC() but for internal ones it will force much more jobs. 3. we can leave it as is waiting for new hardware and OS-es where TLS access is usually greatly improved very often by native hardware support. best regards, Przemek ps. Do you have assembler version of InterlockedDec() function? _______________________________________________ Harbour mailing list Harbour@harbour-project.org http://lists.harbour-project.org/mailman/listinfo/harbour