On Wed, 01 Oct 2008, Mindaugas Kavaliauskas wrote: Hi Mindaugas,
> The results are: > ARR_LEN = 16 ST MT MT > N_LOOPS = 1000000 USE_TLS [...] > total application time: 64.95 100.00 89.30 > Previous MT overhead was 54%, current 37%. Thanks. It confirms other users results. > One thing is not clear for me. You've committed exactly the same inlined > tls accessing as I've used in my test, but your code does not GPF. My was > GPFing because of wrong generated CPU code. Because it's used in different context. The returned value is always assign to C local variable and then this variable is used. In such case even if BCC optimize the code to use only registers without memory vars then it knows in which register it will keep the return value and mark it as used. In GNU C things are much simpler because I can inform GCC which registers and/or memory variables are used as input and which for output. In some context I can even give GCC the freedom to chose any register it prefers. static __inline__ void * hb_stack_ptr_from_tls( void ) { void * p; __asm__ ( "movl %%fs:(0x18), %0\n\t" "movl 0x0e10(%0,%1,4), %0\n\t" :"=a" (p) :"c" (hb_stack_key) ); return p; } In this code I forced accumulator as result ("=a") and marked that counter register ("c") should contain memory variable (hb_stack_key) at input. I do not know how to make sth like that with BCC. In GCC it's safe to use this inline function in any context even without HB_STACK_PRELOAD and I'll make it ASAP what should give small speed improvement yet. best regards, Przemek _______________________________________________ Harbour mailing list Harbour@harbour-project.org http://lists.harbour-project.org/mailman/listinfo/harbour