Przemyslaw Czerpak wrote:
The only reason I see for binding stack preload with "no tls" is that stack
preload also uses inlined Windows like function to access tls. But I see it
as to separate features stack: stack preload and tls access method
(compiler native or system API)?
When compiler native TLS is disabled and file define HB_STACK_PRELOAD
before including harbour header files then each function which have
to access hb_stack buffers it's address by HB_STACK_TLS_PRELOAD.
If possible assembler inline function is used to retrieve stack address
which is a little bit faster then call to OS TLS function and even native
TLS support in some compilers (f.e.BCC).
Compile current SVN code without any additional switches and compare
the tstspeed.prg results to previous ones.
Hi,
thanks for explanation. I just want to run all tests one after another
to make results comparable. Because sometimes numbers obtained a few
days ago can be calculated in a different OS memory/CPU usage state, so
results can give a few seconds difference.
I've used -DHB_USE_TLS to obtain "previous" results (use compiler native
TLS, no stack preloading).
The results are:
10/01/08 10:03:30 Harbour 1.1.0dev (Rev. 9523), Windows XP 5.1.2600
Service Pack 2
ARR_LEN = 16 ST MT MT
N_LOOPS = 1000000 USE_TLS
empty loops overhead = 0.19 0.30 0.28
CPU usage -> secondsCPU()
c:=L_C -> 0.19 0.39 0.31
n:=L_N -> 0.19 0.27 0.19
d:=L_D -> 0.22 0.27 0.19
c:=M_C -> 0.23 0.45 0.38
n:=M_N -> 0.20 0.30 0.23
d:=M_D -> 0.22 0.30 0.23
(sh) c:=F_C -> 0.38 0.81 0.84
(sh) n:=F_N -> 0.58 0.61 0.64
(sh) d:=F_D -> 0.30 0.34 0.36
(ex) c:=F_C -> 0.38 0.81 0.83
(ex) n:=F_N -> 0.56 0.64 0.66
(ex) d:=F_D -> 0.30 0.34 0.33
n:=o:GenCode -> 0.45 0.81 0.78
n:=o[8] -> 0.42 0.63 0.52
round(i/1000,2) -> 0.63 0.92 0.81
str(i/1000) -> 1.50 2.27 2.03
val(a3[i%ARR_LEN+1]) -> 1.36 1.84 1.64
dtos(j+i%10000-5000) -> 1.39 2.06 2.03
eval({||i%ARR_LEN}) -> 0.69 1.03 0.89
eval({|x|x%ARR_LEN},i) -> 0.78 1.20 1.02
eval({|x|f1(x)},i) -> 1.28 1.81 1.42
&('f1('+str(i)+')') -> 7.66 15.13 13.14
eval([&('{|x|f1(x)}')]) -> 1.25 1.81 1.39
j := valtype(a)+valtype(i) -> 1.08 2.00 1.86
j := str(i%100,2) $ a2[i%ARR_LEN+1] -> 2.27 3.45 3.02
j := val(a2[i%ARR_LEN+1]) -> 1.55 2.11 1.91
j := a2[i%ARR_LEN+1] == s -> 1.06 1.70 1.50
j := a2[i%ARR_LEN+1] = s -> 1.11 1.69 1.58
j := a2[i%ARR_LEN+1] >= s -> 1.17 1.67 1.52
j := a2[i%ARR_LEN+1] < s -> 1.13 1.67 1.53
aadd(aa,{i,j,s,a,a2,t,bc}) -> 4.38 5.92 5.81
f0() -> 0.33 0.55 0.42
f1(i) -> 0.55 0.89 0.64
f2(c[8]) -> 0.45 0.81 0.63
f2(c[40000]) -> 0.47 0.80 0.64
f2(@c[40000]) -> 0.36 0.64 0.47
f2(c[40000]); c2:=c -> 0.69 1.19 1.00
f2(@c[40000]); c2:=c -> 0.56 1.03 0.88
f3(a,a2,c,i,j,t,bc) -> 1.16 2.05 1.70
f2(a2) -> 0.44 0.81 0.66
s:=f4() -> 2.13 2.56 2.47
s:=f5() -> 0.84 1.38 1.27
ascan(a,i%ARR_LEN) -> 0.73 1.25 1.06
ascan(a2,c+chr(i%64+64)) -> 2.47 3.67 3.33
ascan(a,{|x|x==i%ARR_LEN}) -> 10.95 13.48 11.66
=============================================================
total application time: 64.95 100.00 89.30
total real time: 65.89 100.97 90.36
Previous MT overhead was 54%, current 37%.
One thing is not clear for me. You've committed exactly the same inlined
tls accessing as I've used in my test, but your code does not GPF. My
was GPFing because of wrong generated CPU code.
Best regards,
Mindaugas
_______________________________________________
Harbour mailing list
Harbour@harbour-project.org
http://lists.harbour-project.org/mailman/listinfo/harbour