Re: [Harbour] Re: XBASE++ speedtst

Przemyslaw Czerpak Sat, 28 Mar 2009 01:04:38 -0700

On Fri, 27 Mar 2009, Angel Pais wrote:

Hi,


> On second machine (dual core) it doesn't gpf'd but results look very 
> strange to me.

I do not find anythinbg strange in the results.

> Pentium 4 3GZ 1GB RAM Dual Core
> 03/27/2009 16:55:40 Windows XP 05.02 Build 03790
> Xbase++ (R) Version 1.90 (MT)+ 
> THREADS: 2
> N_LOOPS: 1000000
>                                                         1 th.  2 th.  factor
> ============================================================================
> [ T001: x := L_C ]____________________________________  0.17   0.25 ->  0.68

rather pure scalability. It's even smaller then 1.0.

> [ T002: x := L_N ]____________________________________  0.29   0.15 ->  1.93
> [ T003: x := L_D ]____________________________________  0.21   0.11 ->  1.91

Quite good. Nearly 2.0 so it was nicely executed on two CPUs simultaneously.

> [ T004: x := S_C ]____________________________________  0.34   0.24 ->  1.42
> [ T005: x := S_N ]____________________________________  0.17   0.19 ->  0.89
> [ T006: x := S_D ]____________________________________  0.29   0.27 ->  1.07
> [ T007: x := M->M_C ]_________________________________  0.66   0.46 ->  1.43
> [ T008: x := M->M_N ]_________________________________  0.44   0.61 ->  0.72
> [ T009: x := M->M_D ]_________________________________  0.50   0.36 ->  1.39
> [ T010: x := M->P_C ]_________________________________  0.61   0.89 ->  0.69
> [ T011: x := M->P_N ]_________________________________  0.42   0.39 ->  1.08
> [ T012: x := M->P_D ]_________________________________  0.83   0.92 ->  0.90
> [ T013: x := F_C ]____________________________________  0.92   0.85 ->  1.08
> [ T014: x := F_N ]____________________________________  0.89   0.61 ->  1.46
> [ T015: x := F_D ]____________________________________  0.89   0.72 ->  1.24
> [ T016: x := o:Args ]_________________________________  0.45   0.70 ->  0.64
> [ T017: x := o[2] ]___________________________________  0.27   0.22 ->  1.23
> [ T018: round( i / 1000, 2 ) ]________________________  4.88   5.51 ->  0.89
> [ T019: str( i / 1000 ) ]_____________________________ 33.63  34.22 ->  0.98
> [ T020: val( s ) ]____________________________________  0.97   1.36 ->  0.71
> [ T021: val( a [ i % 16 + 1 ] ) ]_____________________  3.14   3.42 ->  0.92
> [ T022: dtos( d - i % 10000 ) ]_______________________  3.83   3.81 ->  1.01
> [ T023: eval( { || i % 16 } ) ]_______________________  3.86   4.56 ->  0.85
> [ T024: eval( bc := { || i % 16 } ) ]_________________  2.19   2.24 ->  0.98
> [ T025: eval( { |x| x % 16 }, i ) ]___________________  1.69   1.80 ->  0.94
> [ T026: eval( bc := { |x| x % 16 }, i ) ]_____________  1.61   1.57 ->  1.03
> [ T027: eval( { |x| f1( x ) }, i ) ]__________________  2.24   2.25 ->  1.00
> [ T028: eval( bc := { |x| f1( x ) }, i ) ]____________  2.00   1.94 ->  1.03
> [ T029: eval( bc := &("{ |x| f1( x ) }"), i ) ]_______  2.87   2.78 ->  1.03
> [ T030: x := &( 'f1(' + str(i) + ')' ) ]______________ 82.99  81.23 ->  1.02
> [ T031: bc := &( '{|x|f1(x)}' ), eval( bc, i ) ]______ 78.28  76.13 ->  1.03
> [ T032: x := valtype( x ) +  valtype( i ) ]___________  1.81   1.69 ->  1.07
> [ T033: x := strzero( i % 100, 2 ) $ a[ i % 16 + 1 ] ] 36.16  36.09 ->  1.00
> [ T034: x := a[ i % 16 + 1 ] == s ]___________________  1.28   1.22 ->  1.05
> [ T035: x := a[ i % 16 + 1 ] = s ]____________________  2.00   1.88 ->  1.06
> [ T036: x := a[ i % 16 + 1 ] >= s ]___________________  1.89   1.92 ->  0.98
> [ T037: x := a[ i % 16 + 1 ] <= s ]___________________  2.17   1.94 ->  1.12
> [ T038: x := a[ i % 16 + 1 ] < s ]____________________  1.97   1.90 ->  1.04
> [ T039: x := a[ i % 16 + 1 ] > s ]____________________  2.00   1.91 ->  1.05
> [ T040: ascan( a, i % 16 ) ]__________________________  3.59   3.13 ->  1.15
> [ T041: ascan( a, { |x| x == i % 16 } ) ]_____________ 22.44  64.61 ->  0.35
> [ T042: iif( i%1000==0, a:={}, ), aadd(a,{i,1,.t.,s, ] 27.62  26.45 ->  1.04
> [ T043: x := a ]______________________________________  0.21   0.25 ->  0.84
> [ T044: x := {} ]_____________________________________  2.26   2.08 ->  1.09
> [ T045: f0() ]________________________________________  0.72   0.44 ->  1.64
> [ T046: f1( i ) ]_____________________________________  1.01   0.64 ->  1.58
> [ T047: f2( c[1...8] ) ]______________________________  0.63   0.95 ->  0.66
> [ T048: f2( c[1...40000] ) ]__________________________  0.67   0.78 ->  0.86
> [ T049: f2( @c[1...40000] ) ]_________________________  0.68   0.47 ->  1.45
> [ T050: f2( @c[1...40000] ), c2 := c ]________________  1.09   0.83 ->  1.31
> [ T051: f3( a, a2, s, i, s2, bc, i, n, x ) ]__________  2.12   1.71 ->  1.24
> [ T052: f2( a ) ]_____________________________________  0.57   1.16 ->  0.49
> [ T053: x := f4() ]___________________________________  4.12   3.90 ->  1.06
> [ T054: x := f5() ]___________________________________  2.07   1.86 ->  1.11

> [ T055: x := space(16) ]______________________________  4.78   1.49 ->  3.21

This one is really strange but such things can happen and they are usually
results of some other events, f.e. sth external to above program was executed
suddenly in the 1-st part of test increasing the time or internal GC was
activated. Repeating the test probably will change the results and this 3.21
factor.

> [ T056: f_prv( c ) ]__________________________________  2.27   3.03 ->  0.75
> ============================================================================
> [   TOTAL   ]_________________________________________358.66 393.09 ->  0.91
> ============================================================================
> [ total application time: ]...................................751.90
> [ total real time: ]..........................................751.90

The average factor is 0.91. The ideal value should be 2.00 on two CPU
machine when 2 or more threads are used. This is the real cost of
serialization. It reduce scalability. Please note the the results
depends on executed things. There are things which are scaled quite
well and other which aren't. For sure the memory allocator used by
xbase++ looks much better then DLMALLOC which in real MT mode (real
simultaneous execution usually gives fatal results and it's much
more efficient to compile windows Harbour builds with -DHB_FM_WIN_ALLOC.

> CPU CELEREON 560 2.13 GB 1GB RAM
> 03/27/2009 15:49:36 Windows XP 05.01 Build 02600 Service Pack 2
> Xbase++ (R) Version 1.90 (MT)+ 
> THREADS: 2
> N_LOOPS: 1000000
>                                                         1 th.  2 th.  factor
> ============================================================================
> [ T001: x := L_C ]____________________________________  0.08   0.08 ->  1.00
> [ T002: x := L_N ]____________________________________  0.06   0.05 ->  1.20
> [ T003: x := L_D ]____________________________________  0.04   0.05 ->  0.80
> [ T004: x := S_C ]____________________________________  0.16   0.15 ->  1.07
> [ T005: x := S_N ]____________________________________  0.11   0.14 ->  0.79
> [ T006: x := S_D ]____________________________________  0.13   0.12 ->  1.08
> [ T007: x := M->M_C ]_________________________________  0.32   0.29 ->  1.10
> [ T008: x := M->M_N ]_________________________________  0.25   0.27 ->  0.93
> [ T009: x := M->M_D ]_________________________________  0.23   0.25 ->  0.92
> [ T010: x := M->P_C ]_________________________________  0.33   0.34 ->  0.97
> [ T011: x := M->P_N ]_________________________________  0.30   0.30 ->  1.00
> [ T012: x := M->P_D ]_________________________________  0.31   0.33 ->  0.94
> [ T013: x := F_C ]____________________________________  0.48   0.47 ->  1.02
> [ T014: x := F_N ]____________________________________  0.42   0.41 ->  1.02
> [ T015: x := F_D ]____________________________________  0.41   0.39 ->  1.05
> [ T016: x := o:Args ]_________________________________  0.29   0.29 ->  1.00
> [ T017: x := o[2] ]___________________________________  0.10   0.10 ->  1.00
> [ T018: round( i / 1000, 2 ) ]________________________  0.87   0.86 ->  1.01
> [ T019: str( i / 1000 ) ]_____________________________  4.10   4.14 ->  0.99
> [ T020: val( s ) ]____________________________________  0.58   0.59 ->  0.98
> [ T021: val( a [ i % 16 + 1 ] ) ]_____________________  0.77   0.75 ->  1.03
> [ T022: dtos( d - i % 10000 ) ]_______________________  1.03   1.03 ->  1.00
> [ T023: eval( { || i % 16 } ) ]_______________________  1.86   1.84 ->  1.01
> [ T024: eval( bc := { || i % 16 } ) ]_________________  1.19   1.05 ->  1.13
> [ T025: eval( { |x| x % 16 }, i ) ]___________________  0.85   0.87 ->  0.98
> [ T026: eval( bc := { |x| x % 16 }, i ) ]_____________  0.70   0.71 ->  0.99
> [ T027: eval( { |x| f1( x ) }, i ) ]__________________  1.07   1.05 ->  1.02
> [ T028: eval( bc := { |x| f1( x ) }, i ) ]____________  0.93   0.91 ->  1.02
> [ T029: eval( bc := &("{ |x| f1( x ) }"), i ) ]_______  1.42   1.36 ->  1.04
> [ T030: x := &( 'f1(' + str(i) + ')' ) ]______________ 18.22  18.38 ->  0.99
> [ T031: bc := &( '{|x|f1(x)}' ), eval( bc, i ) ]______ 27.46  27.57 ->  1.00
> [ T032: x := valtype( x ) +  valtype( i ) ]___________  0.61   0.57 ->  1.07
> [ T033: x := strzero( i % 100, 2 ) $ a[ i % 16 + 1 ] ]  3.71   3.69 ->  1.01
> [ T034: x := a[ i % 16 + 1 ] == s ]___________________  0.49   0.46 ->  1.07
> [ T035: x := a[ i % 16 + 1 ] = s ]____________________  0.66   0.67 ->  0.99
> [ T036: x := a[ i % 16 + 1 ] >= s ]___________________  0.67   0.69 ->  0.97
> [ T037: x := a[ i % 16 + 1 ] <= s ]___________________  0.68   0.69 ->  0.99
> [ T038: x := a[ i % 16 + 1 ] < s ]____________________  0.69   0.69 ->  1.00
> [ T039: x := a[ i % 16 + 1 ] > s ]____________________  0.70   0.69 ->  1.01
> [ T040: ascan( a, i % 16 ) ]__________________________  1.34   1.28 ->  1.05
> [ T041: ascan( a, { |x| x == i % 16 } ) ]_____________  8.60   8.53 ->  1.01
> [ T042: iif( i%1000==0, a:={}, ), aadd(a,{i,1,.t.,s, ] 13.12  13.38 ->  0.98
> [ T043: x := a ]______________________________________  0.07   0.06 ->  1.17

You can try to use --exclude=044 as additional parameter to finish the test.
Here results are close to 1.0 what is expected for single CPU machine.
But I'm finding one thing very interesting. They are much better then in
the 1-st test though the computer seems to be slower. Am I right?
If yes then it will suggest that xbase++ does not enable some internal
MT logic in all cases and try to use some faster VM module/synchronization
mechanism when it detects single CPU machine and/or process does not create
new threads. It will be interesting to find when exactly such faster internal
logic is used. It's possible that the GPF problem in T044 can be exploited
only when this faster but not really MT safe module is enabled.
It looks like some type of runtime/startup VM swithing.

Thank you very much for your tests.

What are Harbour results on the same computers?

best regards,
Przemek
_______________________________________________
Harbour mailing list
Harbour@harbour-project.org
http://lists.harbour-project.org/mailman/listinfo/harbour

Re: [Harbour] Re: XBASE++ speedtst

Reply via email to