Przemek:

For mttest05:
[...]
result:
0 errors

Thank you very much for your tests.
Looks that it works correctly anyhow I do not
like the way it is done. I hope that OS2 users
will clean this code in the future.

Maurilio, are you reading ?  :-)
( yes, I know you can clean it )


2008-09-21 23:03 UTC+0200 Przemyslaw Czerpak (druzus/at/priv.onet.pl)
 + harbour/tests/mt/mttest08.prg
   + added test code for thread memvars inheritance

Results below for OS/2:mttest08.prg:mt


The speed difference was the cost of using mutexes to protect
access to reference counters. As you can see they are very slow
in OS2 - much slower then futexes in Linux and critical sections
in Windows. Using assembler inlined code for atomic incrementation/
decrementation of reference counters greatly improved the speed
and reduce MT HVM overhead from ~89 sec. to ~20 sec. so the real
speed improvement calculated only for MT overhead is much better.
Now we can try to reduce the rest from this 20 sec. In OS2 it should
be possible to nicely use stack macros even without native compiler
TLS support. I'll make such modification in a while and I would like
to ask you to repeat the speedtst.prg tests with new SVN code.

2008-09-22 16:14 UTC+0200 Przemyslaw Czerpak (druzus/at/priv.onet.pl)
 * harbour/include/hbstack.h
 * harbour/source/vm/estack.c
   + added support for using stack macros without native compiler TLS
     support in MT mode.
   * enabled stack macros by default for OS2 MT builds
     David, if possible please try speedtst.prg with current MT HVM

BTW if possible please try speedtst.prg with current MT HVM on OS2.
   I enabled stack macros for these builds and I'd like to know
   if it works and what is the speed difference to your previous
   tests to calculate real overhead caused by accessing TLS date
   in OS2.

Below are results of current Harbour under eComStation 1.2MR
gcc 3.3.5

ST
   total application time:                              47.13
   total real time:                                     47.14
MT
   total application time:                              55.08
   total real time:                                     55.08

before:
MT
   total application time:                              67.05
   total real time:                                     67.05
and older before:
   total application time:                             135.88
   total real time:                                    135.88

ST keep same as previous, but MT continue falling
  135.88  -->  67.05  --> 55.08 seconds

Now MT is only 17 % slower than ST   :-)

You improved MT performance in few days, so we are near days where MT
can be used in place of ST without too much speed penalty

I have not started MT tests in Win32 / Linux yet. Both will be in
Core2Duo 2.0 so a great speed difference should be expected respect to
current Athlon 2200+ tests

Thanks for your work

David Macias


..............................

[E:\harbour809mt\harbour\tests\mt]mttest08_mt.exe

Main start

Do not inherit memvars.
main thread:
    PUB1: C -> main:public1
    PUB2: C -> main:public2
    PRV1: C -> main:private1
    PRV2: C -> main:private2
child thread:
    PUB1: U
    PUB2: U
    PRV1: U
    PRV2: U
assign...
child thread:
    PUB1: C -> thread:public1
    PUB2: C -> thread:public2
    PRV1: C -> thread:private1
    PRV2: C -> thread:private2
main thread:
    PUB1: C -> main:public1
    PUB2: C -> main:public2
    PRV1: C -> main:private1
    PRV2: C -> main:private2
Press any key to continue...


Inherit copy of publics.
main thread:
    PUB1: C -> main:public1
    PUB2: C -> main:public2
    PRV1: C -> main:private1
    PRV2: C -> main:private2
child thread:
    PUB1: C -> main:public1
    PUB2: C -> main:public2
    PRV1: U
    PRV2: U
assign...
child thread:
    PUB1: C -> thread:public1
    PUB2: C -> thread:public2
    PRV1: C -> thread:private1
    PRV2: C -> thread:private2
main thread:
    PUB1: C -> main:public1
    PUB2: C -> main:public2
    PRV1: C -> main:private1
    PRV2: C -> main:private2
Press any key to continue...

Inherit copy of privates.
main thread:
    PUB1: C -> main:public1
    PUB2: C -> main:public2
    PRV1: C -> main:private1
    PRV2: C -> main:private2
child thread:
    PUB1: U
    PUB2: U
    PRV1: C -> main:private1
    PRV2: C -> main:private2
assign...
child thread:
    PUB1: C -> thread:public1
    PUB2: C -> thread:public2
    PRV1: C -> thread:private1
    PRV2: C -> thread:private2
main thread:
    PUB1: C -> main:public1
    PUB2: C -> main:public2
    PRV1: C -> main:private1
    PRV2: C -> main:private2
Press any key to continue...

Inherit copy of publics and privates.
main thread:
    PUB1: C -> main:public1
    PUB2: C -> main:public2
    PRV1: C -> main:private1
    PRV2: C -> main:private2
child thread:
    PUB1: C -> main:public1
    PUB2: C -> main:public2
    PRV1: C -> main:private1
    PRV2: C -> main:private2
assign...
child thread:
    PUB1: C -> thread:public1
    PUB2: C -> thread:public2
    PRV1: C -> thread:private1
    PRV2: C -> thread:private2
main thread:
    PUB1: C -> main:public1
    PUB2: C -> main:public2
    PRV1: C -> main:private1
    PRV2: C -> main:private2
Press any key to continue...

Share publics with child threads.
main thread:
    PUB1: C -> main:public1
    PUB2: C -> main:public2
    PRV1: C -> main:private1
    PRV2: C -> main:private2
child thread:
    PUB1: C -> main:public1
    PUB2: C -> main:public2
    PRV1: U
    PRV2: U
assign...
child thread:
    PUB1: C -> thread:public1
    PUB2: C -> thread:public2
    PRV1: C -> thread:private1
    PRV2: C -> thread:private2
main thread:
    PUB1: C -> thread:public1
    PUB2: C -> thread:public2
    PRV1: C -> main:private1
    PRV2: C -> main:private2
Press any key to continue...

Share privates with child threads.
main thread:
    PUB1: C -> main:public1
    PUB2: C -> main:public2
    PRV1: C -> main:private1
    PRV2: C -> main:private2
child thread:
    PUB1: U
    PUB2: U
    PRV1: C -> main:private1
    PRV2: C -> main:private2
assign...
child thread:
    PUB1: C -> thread:public1
    PUB2: C -> thread:public2
    PRV1: C -> thread:private1
    PRV2: C -> thread:private2
main thread:
    PUB1: C -> main:public1
    PUB2: C -> main:public2
    PRV1: C -> thread:private1
    PRV2: C -> thread:private2
Press any key to continue...

Share publics and privates with child threads.
main thread:
    PUB1: C -> main:public1
    PUB2: C -> main:public2
    PRV1: C -> main:private1
    PRV2: C -> main:private2
child thread:
    PUB1: C -> main:public1
    PUB2: C -> main:public2
    PRV1: C -> main:private1
    PRV2: C -> main:private2
assign...
child thread:
    PUB1: C -> thread:public1
    PUB2: C -> thread:public2
    PRV1: C -> thread:private1
    PRV2: C -> thread:private2
main thread:
    PUB1: C -> thread:public1
    PUB2: C -> thread:public2
    PRV1: C -> thread:private1
    PRV2: C -> thread:private2
Press any key to continue...


..............................

[E:\harbour809mt\harbour\tests]speedtst_st.exe

Startup loop to increase CPU clock...

09/23/08 02:08:43 Harbour 1.1.0dev (Rev. 9469), OS/2 4.50
ARR_LEN =         16
N_LOOPS =    1000000
empty loops overhead =          0.14
CPU usage -> secondsCPU()

c:=L_C ->                                             0.10
n:=L_N ->                                             0.09
d:=L_D ->                                             0.09
c:=M_C ->                                             0.12
n:=M_N ->                                             0.10
d:=M_D ->                                             0.10
(sh) c:=F_C ->                                        0.45
(sh) n:=F_N ->                                        0.34
(sh) d:=F_D ->                                        0.23
(ex) c:=F_C ->                                        0.43
(ex) n:=F_N ->                                        0.34
(ex) d:=F_D ->                                        0.21
n:=o:GenCode ->                                       0.32
n:=o[8] ->                                            0.21
round(i/1000,2) ->                                    0.45
str(i/1000) ->                                        1.34
val(a3[i%ARR_LEN+1]) ->                               0.91
dtos(j+i%10000-5000) ->                               1.07
eval({||i%ARR_LEN}) ->                                0.46
eval({|x|x%ARR_LEN},i) ->                             0.51
eval({|x|f1(x)},i) ->                                 0.74
&('f1('+str(i)+')') ->                                7.96
eval([&('{|x|f1(x)}')]) ->                            0.75
j := valtype(a)+valtype(i) ->                         0.93
j := str(i%100,2) $ a2[i%ARR_LEN+1] ->                1.94
j := val(a2[i%ARR_LEN+1]) ->                          0.97
j := a2[i%ARR_LEN+1] == s ->                          0.64
j := a2[i%ARR_LEN+1] = s ->                           0.75
j := a2[i%ARR_LEN+1] >= s ->                          0.70
j := a2[i%ARR_LEN+1] < s ->                           0.71
aadd(aa,{i,j,s,a,a2,t,bc}) ->                         2.37
f0() ->                                               0.27
f1(i) ->                                              0.43
f2(c[8]) ->                                           0.32
f2(c[40000]) ->                                       0.32
f2(@c[40000]) ->                                      0.30
f2(c[40000]); c2:=c ->                                0.42
f2(@c[40000]); c2:=c ->                               0.42
f3(a,a2,c,i,j,t,bc) ->                                0.73
f2(a2) ->                                             0.32
s:=f4() ->                                            1.34
s:=f5() ->                                            0.73
ascan(a,i%ARR_LEN) ->                                 0.61
ascan(a2,c+chr(i%64+64)) ->                           2.55
ascan(a,{|x|x==i%ARR_LEN}) ->                         5.53
============================================================
total application time:                              47.13
total real time:                                     47.14


..............................

[E:\harbour809mt\harbour\tests]speedtst_mt.exe

Startup loop to increase CPU clock...

09/23/08 02:09:48 Harbour 1.1.0dev (Rev. 9469) (MT), OS/2 4.50
ARR_LEN =         16
N_LOOPS =    1000000
empty loops overhead =          0.17
CPU usage -> secondsCPU()

c:=L_C ->                                             0.14
n:=L_N ->                                             0.12
d:=L_D ->                                             0.12
c:=M_C ->                                             0.15
n:=M_N ->                                             0.13
d:=M_D ->                                             0.13
(sh) c:=F_C ->                                        0.43
(sh) n:=F_N ->                                        0.38
(sh) d:=F_D ->                                        0.22
(ex) c:=F_C ->                                        0.43
(ex) n:=F_N ->                                        0.37
(ex) d:=F_D ->                                        0.22
n:=o:GenCode ->                                       0.39
n:=o[8] ->                                            0.25
round(i/1000,2) ->                                    0.52
str(i/1000) ->                                        1.37
val(a3[i%ARR_LEN+1]) ->                               0.97
dtos(j+i%10000-5000) ->                               1.10
eval({||i%ARR_LEN}) ->                                0.52
eval({|x|x%ARR_LEN},i) ->                             0.58
eval({|x|f1(x)},i) ->                                 0.86
&('f1('+str(i)+')') ->                                8.62
eval([&('{|x|f1(x)}')]) ->                            0.89
j := valtype(a)+valtype(i) ->                         1.07
j := str(i%100,2) $ a2[i%ARR_LEN+1] ->                1.94
j := val(a2[i%ARR_LEN+1]) ->                          1.02
j := a2[i%ARR_LEN+1] == s ->                          0.73
j := a2[i%ARR_LEN+1] = s ->                           0.87
j := a2[i%ARR_LEN+1] >= s ->                          0.86
j := a2[i%ARR_LEN+1] < s ->                           0.87
aadd(aa,{i,j,s,a,a2,t,bc}) ->                         3.43
f0() ->                                               0.33
f1(i) ->                                              0.49
f2(c[8]) ->                                           0.38
f2(c[40000]) ->                                       0.39
f2(@c[40000]) ->                                      0.36
f2(c[40000]); c2:=c ->                                0.52
f2(@c[40000]); c2:=c ->                               0.50
f3(a,a2,c,i,j,t,bc) ->                                0.87
f2(a2) ->                                             0.39
s:=f4() ->                                            1.42
s:=f5() ->                                            0.79
ascan(a,i%ARR_LEN) ->                                 0.74
ascan(a2,c+chr(i%64+64)) ->                           2.69
ascan(a,{|x|x==i%ARR_LEN}) ->                         7.66
============================================================
total application time:                              55.08
total real time:                                     55.08

..............................


_______________________________________________
Harbour mailing list
Harbour@harbour-project.org
http://lists.harbour-project.org/mailman/listinfo/harbour

Reply via email to