Przemyslaw Czerpak wrote:
The only reason I see for binding stack preload with "no tls" is that stack preload also uses inlined Windows like function to access tls. But I see it as to separate features stack: stack preload and tls access method (compiler native or system API)?

When compiler native TLS is disabled and file define HB_STACK_PRELOAD
before including harbour header files then each function which have
to access hb_stack buffers it's address by HB_STACK_TLS_PRELOAD.
If possible assembler inline function is used to retrieve stack address
which is a little bit faster then call to OS TLS function and even native
TLS support in some compilers (f.e.BCC).
Compile current SVN code without any additional switches and compare
the tstspeed.prg results to previous ones.

Hi,


thanks for explanation. I just want to run all tests one after another to make results comparable. Because sometimes numbers obtained a few days ago can be calculated in a different OS memory/CPU usage state, so results can give a few seconds difference. I've used -DHB_USE_TLS to obtain "previous" results (use compiler native TLS, no stack preloading).

The results are:

10/01/08 10:03:30 Harbour 1.1.0dev (Rev. 9523), Windows XP 5.1.2600 Service Pack 2

ARR_LEN =         16                      ST      MT      MT
N_LOOPS =    1000000                            USE_TLS
empty loops overhead =                   0.19    0.30    0.28
CPU usage -> secondsCPU()

c:=L_C ->                                0.19    0.39    0.31
n:=L_N ->                                0.19    0.27    0.19
d:=L_D ->                                0.22    0.27    0.19
c:=M_C ->                                0.23    0.45    0.38
n:=M_N ->                                0.20    0.30    0.23
d:=M_D ->                                0.22    0.30    0.23
(sh) c:=F_C ->                           0.38    0.81    0.84
(sh) n:=F_N ->                           0.58    0.61    0.64
(sh) d:=F_D ->                           0.30    0.34    0.36
(ex) c:=F_C ->                           0.38    0.81    0.83
(ex) n:=F_N ->                           0.56    0.64    0.66
(ex) d:=F_D ->                           0.30    0.34    0.33
n:=o:GenCode ->                          0.45    0.81    0.78
n:=o[8] ->                               0.42    0.63    0.52
round(i/1000,2) ->                       0.63    0.92    0.81
str(i/1000) ->                           1.50    2.27    2.03
val(a3[i%ARR_LEN+1]) ->                  1.36    1.84    1.64
dtos(j+i%10000-5000) ->                  1.39    2.06    2.03
eval({||i%ARR_LEN}) ->                   0.69    1.03    0.89
eval({|x|x%ARR_LEN},i) ->                0.78    1.20    1.02
eval({|x|f1(x)},i) ->                    1.28    1.81    1.42
&('f1('+str(i)+')') ->                   7.66   15.13   13.14
eval([&('{|x|f1(x)}')]) ->               1.25    1.81    1.39
j := valtype(a)+valtype(i) ->            1.08    2.00    1.86
j := str(i%100,2) $ a2[i%ARR_LEN+1] ->   2.27    3.45    3.02
j := val(a2[i%ARR_LEN+1]) ->             1.55    2.11    1.91
j := a2[i%ARR_LEN+1] == s ->             1.06    1.70    1.50
j := a2[i%ARR_LEN+1] = s ->              1.11    1.69    1.58
j := a2[i%ARR_LEN+1] >= s ->             1.17    1.67    1.52
j := a2[i%ARR_LEN+1] < s ->              1.13    1.67    1.53
aadd(aa,{i,j,s,a,a2,t,bc}) ->            4.38    5.92    5.81
f0() ->                                  0.33    0.55    0.42
f1(i) ->                                 0.55    0.89    0.64
f2(c[8]) ->                              0.45    0.81    0.63
f2(c[40000]) ->                          0.47    0.80    0.64
f2(@c[40000]) ->                         0.36    0.64    0.47
f2(c[40000]); c2:=c ->                   0.69    1.19    1.00
f2(@c[40000]); c2:=c ->                  0.56    1.03    0.88
f3(a,a2,c,i,j,t,bc) ->                   1.16    2.05    1.70
f2(a2) ->                                0.44    0.81    0.66
s:=f4() ->                               2.13    2.56    2.47
s:=f5() ->                               0.84    1.38    1.27
ascan(a,i%ARR_LEN) ->                    0.73    1.25    1.06
ascan(a2,c+chr(i%64+64)) ->              2.47    3.67    3.33
ascan(a,{|x|x==i%ARR_LEN}) ->           10.95   13.48   11.66
=============================================================
total application time:                 64.95  100.00   89.30
total real time:                        65.89  100.97   90.36

Previous MT overhead was 54%, current 37%.

One thing is not clear for me. You've committed exactly the same inlined tls accessing as I've used in my test, but your code does not GPF. My was GPFing because of wrong generated CPU code.


Best regards,
Mindaugas

_______________________________________________
Harbour mailing list
Harbour@harbour-project.org
http://lists.harbour-project.org/mailman/listinfo/harbour

Reply via email to