On Wed, 01 Oct 2008, Mindaugas Kavaliauskas wrote:

Hi Mindaugas,

> The results are:
> ARR_LEN =         16                      ST      MT      MT
> N_LOOPS =    1000000                            USE_TLS
[...]
> total application time:                 64.95  100.00   89.30
> Previous MT overhead was 54%, current 37%.

Thanks. It confirms other users results.

> One thing is not clear for me. You've committed exactly the same inlined 
> tls accessing as I've used in my test, but your code does not GPF. My was 
> GPFing because of wrong generated CPU code.

Because it's used in different context. The returned value is
always assign to C local variable and then this variable is used.
In such case even if BCC optimize the code to use only registers
without memory vars then it knows in which register it will keep
the return value and mark it as used.

In GNU C things are much simpler because I can inform GCC which
registers and/or memory variables are used as input and which for
output. In some context I can even give GCC the freedom to chose
any register it prefers.

   static __inline__ void * hb_stack_ptr_from_tls( void )
   {
      void * p;
      __asm__ (
         "movl  %%fs:(0x18), %0\n\t"
         "movl  0x0e10(%0,%1,4), %0\n\t"
         :"=a" (p)
         :"c" (hb_stack_key)
      );
      return p;
   }

In this code I forced accumulator as result ("=a") and marked that
counter register ("c") should contain memory variable (hb_stack_key) at
input. I do not know how to make sth like that with BCC.
In GCC it's safe to use this inline function in any context even
without HB_STACK_PRELOAD and I'll make it ASAP what should give small
speed improvement yet.

best regards,
Przemek
_______________________________________________
Harbour mailing list
Harbour@harbour-project.org
http://lists.harbour-project.org/mailman/listinfo/harbour

Reply via email to