Przemek:
For mttest05:
[...]
result: 0 errors
Thank you very much for your tests. Looks that it works correctly anyhow I do not like the way it is done. I hope that OS2 users will clean this code in the future.
Maurilio, are you reading ? :-) ( yes, I know you can clean it )
2008-09-21 23:03 UTC+0200 Przemyslaw Czerpak (druzus/at/priv.onet.pl) + harbour/tests/mt/mttest08.prg + added test code for thread memvars inheritance
Results below for OS/2:mttest08.prg:mt
The speed difference was the cost of using mutexes to protect access to reference counters. As you can see they are very slow in OS2 - much slower then futexes in Linux and critical sections in Windows. Using assembler inlined code for atomic incrementation/ decrementation of reference counters greatly improved the speed and reduce MT HVM overhead from ~89 sec. to ~20 sec. so the real speed improvement calculated only for MT overhead is much better. Now we can try to reduce the rest from this 20 sec. In OS2 it should be possible to nicely use stack macros even without native compiler TLS support. I'll make such modification in a while and I would like to ask you to repeat the speedtst.prg tests with new SVN code.
2008-09-22 16:14 UTC+0200 Przemyslaw Czerpak (druzus/at/priv.onet.pl) * harbour/include/hbstack.h * harbour/source/vm/estack.c + added support for using stack macros without native compiler TLS support in MT mode. * enabled stack macros by default for OS2 MT builds David, if possible please try speedtst.prg with current MT HVM
BTW if possible please try speedtst.prg with current MT HVM on OS2. I enabled stack macros for these builds and I'd like to know if it works and what is the speed difference to your previous tests to calculate real overhead caused by accessing TLS date in OS2.
Below are results of current Harbour under eComStation 1.2MR gcc 3.3.5 ST total application time: 47.13 total real time: 47.14 MT total application time: 55.08 total real time: 55.08 before: MT total application time: 67.05 total real time: 67.05 and older before: total application time: 135.88 total real time: 135.88 ST keep same as previous, but MT continue falling 135.88 --> 67.05 --> 55.08 seconds Now MT is only 17 % slower than ST :-) You improved MT performance in few days, so we are near days where MT can be used in place of ST without too much speed penalty I have not started MT tests in Win32 / Linux yet. Both will be in Core2Duo 2.0 so a great speed difference should be expected respect to current Athlon 2200+ tests Thanks for your work David Macias .............................. [E:\harbour809mt\harbour\tests\mt]mttest08_mt.exe Main start Do not inherit memvars. main thread: PUB1: C -> main:public1 PUB2: C -> main:public2 PRV1: C -> main:private1 PRV2: C -> main:private2 child thread: PUB1: U PUB2: U PRV1: U PRV2: U assign... child thread: PUB1: C -> thread:public1 PUB2: C -> thread:public2 PRV1: C -> thread:private1 PRV2: C -> thread:private2 main thread: PUB1: C -> main:public1 PUB2: C -> main:public2 PRV1: C -> main:private1 PRV2: C -> main:private2 Press any key to continue... Inherit copy of publics. main thread: PUB1: C -> main:public1 PUB2: C -> main:public2 PRV1: C -> main:private1 PRV2: C -> main:private2 child thread: PUB1: C -> main:public1 PUB2: C -> main:public2 PRV1: U PRV2: U assign... child thread: PUB1: C -> thread:public1 PUB2: C -> thread:public2 PRV1: C -> thread:private1 PRV2: C -> thread:private2 main thread: PUB1: C -> main:public1 PUB2: C -> main:public2 PRV1: C -> main:private1 PRV2: C -> main:private2 Press any key to continue... Inherit copy of privates. main thread: PUB1: C -> main:public1 PUB2: C -> main:public2 PRV1: C -> main:private1 PRV2: C -> main:private2 child thread: PUB1: U PUB2: U PRV1: C -> main:private1 PRV2: C -> main:private2 assign... child thread: PUB1: C -> thread:public1 PUB2: C -> thread:public2 PRV1: C -> thread:private1 PRV2: C -> thread:private2 main thread: PUB1: C -> main:public1 PUB2: C -> main:public2 PRV1: C -> main:private1 PRV2: C -> main:private2 Press any key to continue... Inherit copy of publics and privates. main thread: PUB1: C -> main:public1 PUB2: C -> main:public2 PRV1: C -> main:private1 PRV2: C -> main:private2 child thread: PUB1: C -> main:public1 PUB2: C -> main:public2 PRV1: C -> main:private1 PRV2: C -> main:private2 assign... child thread: PUB1: C -> thread:public1 PUB2: C -> thread:public2 PRV1: C -> thread:private1 PRV2: C -> thread:private2 main thread: PUB1: C -> main:public1 PUB2: C -> main:public2 PRV1: C -> main:private1 PRV2: C -> main:private2 Press any key to continue... Share publics with child threads. main thread: PUB1: C -> main:public1 PUB2: C -> main:public2 PRV1: C -> main:private1 PRV2: C -> main:private2 child thread: PUB1: C -> main:public1 PUB2: C -> main:public2 PRV1: U PRV2: U assign... child thread: PUB1: C -> thread:public1 PUB2: C -> thread:public2 PRV1: C -> thread:private1 PRV2: C -> thread:private2 main thread: PUB1: C -> thread:public1 PUB2: C -> thread:public2 PRV1: C -> main:private1 PRV2: C -> main:private2 Press any key to continue... Share privates with child threads. main thread: PUB1: C -> main:public1 PUB2: C -> main:public2 PRV1: C -> main:private1 PRV2: C -> main:private2 child thread: PUB1: U PUB2: U PRV1: C -> main:private1 PRV2: C -> main:private2 assign... child thread: PUB1: C -> thread:public1 PUB2: C -> thread:public2 PRV1: C -> thread:private1 PRV2: C -> thread:private2 main thread: PUB1: C -> main:public1 PUB2: C -> main:public2 PRV1: C -> thread:private1 PRV2: C -> thread:private2 Press any key to continue... Share publics and privates with child threads. main thread: PUB1: C -> main:public1 PUB2: C -> main:public2 PRV1: C -> main:private1 PRV2: C -> main:private2 child thread: PUB1: C -> main:public1 PUB2: C -> main:public2 PRV1: C -> main:private1 PRV2: C -> main:private2 assign... child thread: PUB1: C -> thread:public1 PUB2: C -> thread:public2 PRV1: C -> thread:private1 PRV2: C -> thread:private2 main thread: PUB1: C -> thread:public1 PUB2: C -> thread:public2 PRV1: C -> thread:private1 PRV2: C -> thread:private2 Press any key to continue... .............................. [E:\harbour809mt\harbour\tests]speedtst_st.exe Startup loop to increase CPU clock... 09/23/08 02:08:43 Harbour 1.1.0dev (Rev. 9469), OS/2 4.50 ARR_LEN = 16 N_LOOPS = 1000000 empty loops overhead = 0.14 CPU usage -> secondsCPU() c:=L_C -> 0.10 n:=L_N -> 0.09 d:=L_D -> 0.09 c:=M_C -> 0.12 n:=M_N -> 0.10 d:=M_D -> 0.10 (sh) c:=F_C -> 0.45 (sh) n:=F_N -> 0.34 (sh) d:=F_D -> 0.23 (ex) c:=F_C -> 0.43 (ex) n:=F_N -> 0.34 (ex) d:=F_D -> 0.21 n:=o:GenCode -> 0.32 n:=o[8] -> 0.21 round(i/1000,2) -> 0.45 str(i/1000) -> 1.34 val(a3[i%ARR_LEN+1]) -> 0.91 dtos(j+i%10000-5000) -> 1.07 eval({||i%ARR_LEN}) -> 0.46 eval({|x|x%ARR_LEN},i) -> 0.51 eval({|x|f1(x)},i) -> 0.74 &('f1('+str(i)+')') -> 7.96 eval([&('{|x|f1(x)}')]) -> 0.75 j := valtype(a)+valtype(i) -> 0.93 j := str(i%100,2) $ a2[i%ARR_LEN+1] -> 1.94 j := val(a2[i%ARR_LEN+1]) -> 0.97 j := a2[i%ARR_LEN+1] == s -> 0.64 j := a2[i%ARR_LEN+1] = s -> 0.75 j := a2[i%ARR_LEN+1] >= s -> 0.70 j := a2[i%ARR_LEN+1] < s -> 0.71 aadd(aa,{i,j,s,a,a2,t,bc}) -> 2.37 f0() -> 0.27 f1(i) -> 0.43 f2(c[8]) -> 0.32 f2(c[40000]) -> 0.32 f2(@c[40000]) -> 0.30 f2(c[40000]); c2:=c -> 0.42 f2(@c[40000]); c2:=c -> 0.42 f3(a,a2,c,i,j,t,bc) -> 0.73 f2(a2) -> 0.32 s:=f4() -> 1.34 s:=f5() -> 0.73 ascan(a,i%ARR_LEN) -> 0.61 ascan(a2,c+chr(i%64+64)) -> 2.55 ascan(a,{|x|x==i%ARR_LEN}) -> 5.53 ============================================================ total application time: 47.13 total real time: 47.14 .............................. [E:\harbour809mt\harbour\tests]speedtst_mt.exe Startup loop to increase CPU clock... 09/23/08 02:09:48 Harbour 1.1.0dev (Rev. 9469) (MT), OS/2 4.50 ARR_LEN = 16 N_LOOPS = 1000000 empty loops overhead = 0.17 CPU usage -> secondsCPU() c:=L_C -> 0.14 n:=L_N -> 0.12 d:=L_D -> 0.12 c:=M_C -> 0.15 n:=M_N -> 0.13 d:=M_D -> 0.13 (sh) c:=F_C -> 0.43 (sh) n:=F_N -> 0.38 (sh) d:=F_D -> 0.22 (ex) c:=F_C -> 0.43 (ex) n:=F_N -> 0.37 (ex) d:=F_D -> 0.22 n:=o:GenCode -> 0.39 n:=o[8] -> 0.25 round(i/1000,2) -> 0.52 str(i/1000) -> 1.37 val(a3[i%ARR_LEN+1]) -> 0.97 dtos(j+i%10000-5000) -> 1.10 eval({||i%ARR_LEN}) -> 0.52 eval({|x|x%ARR_LEN},i) -> 0.58 eval({|x|f1(x)},i) -> 0.86 &('f1('+str(i)+')') -> 8.62 eval([&('{|x|f1(x)}')]) -> 0.89 j := valtype(a)+valtype(i) -> 1.07 j := str(i%100,2) $ a2[i%ARR_LEN+1] -> 1.94 j := val(a2[i%ARR_LEN+1]) -> 1.02 j := a2[i%ARR_LEN+1] == s -> 0.73 j := a2[i%ARR_LEN+1] = s -> 0.87 j := a2[i%ARR_LEN+1] >= s -> 0.86 j := a2[i%ARR_LEN+1] < s -> 0.87 aadd(aa,{i,j,s,a,a2,t,bc}) -> 3.43 f0() -> 0.33 f1(i) -> 0.49 f2(c[8]) -> 0.38 f2(c[40000]) -> 0.39 f2(@c[40000]) -> 0.36 f2(c[40000]); c2:=c -> 0.52 f2(@c[40000]); c2:=c -> 0.50 f3(a,a2,c,i,j,t,bc) -> 0.87 f2(a2) -> 0.39 s:=f4() -> 1.42 s:=f5() -> 0.79 ascan(a,i%ARR_LEN) -> 0.74 ascan(a2,c+chr(i%64+64)) -> 2.69 ascan(a,{|x|x==i%ARR_LEN}) -> 7.66 ============================================================ total application time: 55.08 total real time: 55.08 .............................. _______________________________________________ Harbour mailing list Harbour@harbour-project.org http://lists.harbour-project.org/mailman/listinfo/harbour