On Fri, 03 Oct 2008, Maurilio Longo wrote:

Hi Maurilio,

> > In -gc[0-2] mode each threads makes:
> >       if( ! --uiPolls )
> >          hb_inkeyPoll();
> > in main HVM loop. Simple hack even acceptable in ST programs is killing
> > the performance in ST ones. Important is not hb_inkeyPoll() call (you can
> > comment it but it will not change the results) but --uiPolls.
> > Each thread decrements the same memory variable and it causes some horrible
> > synchronization issues between CPUs on multi CPU machines which kill the
> > performance. It's a hardware issue so I do not know how big overhead it
> > may cause on other machines. I'm very interesting in your results.
> > Please rebuild Harbour with -DHB_GUI macro or compile speedtst.prg with
> > -gc3 and compare results.
> > 
> I did a full rebuild of latest svn code and I've added -gc3 to my modified
> speedtst.prg and here are the results.
>  3 - ascan(a,{|x|x==i%ARR_LEN}) ->                        22.03
>  2 - ascan(a2,c+chr(i%64+64)) ->                           3.72
> ============================================================
>  0 - total application time:                              55.27
> total real time:                                     55.27
> So it decreased total time of a little less than some 10%

I have different results.
With the --uiPolls the speed is reduced to single CPU.
When I eliminated it the results from modified by you speedtst.prg
where reduced nearly by half.
I had to change the size in str(hb_threadID(),2) to 11.

10/02/08 20:10:09 Harbour 1.1.0dev (Rev. 9533) (MT), Linux 2.6.25.11-0.1-pae 
i686 
[...]
-1223345264 - f2(a2) ->                                             0.99 
-1223345264 - ascan(a,i%ARR_LEN) ->                                 1.49 
-1214952560 - &('f1('+str(i)+')') ->                               11.96 
-1214952560 - s:=f4() ->                                            3.03 
-1214952560 - s:=f5() ->                                            2.26 
-1214952560 - ascan(a2,c+chr(i%64+64)) ->                           5.41 
-1223345264 - ascan(a,{|x|x==i%ARR_LEN}) ->                        13.29 
============================================================ 
          0 - total application time:                              57.98 
total real time:                                     29.64 


10/02/08 20:11:30 Harbour 1.1.0dev (Rev. 9533) (MT), Linux 2.6.25.11-0.1-pae 
i686 
[...]
-1224356976 - f2(a2) ->                                             0.56 
-1224356976 - ascan(a,i%ARR_LEN) ->                                 1.10 
-1215964272 - &('f1('+str(i)+')') ->                                8.85 
-1215964272 - s:=f4() ->                                            1.83 
-1215964272 - s:=f5() ->                                            1.34 
-1215964272 - ascan(a2,c+chr(i%64+64)) ->                           3.67 
-1224356976 - ascan(a,{|x|x==i%ARR_LEN}) ->                        10.31 
============================================================ 
          0 - total application time:                              31.95 
total real time:                                     16.51 

though they where not doubled when you compare then to MT version
executed with single thread.
Looks like this is strictly hardware oriented issue. Maybe in my
machine such situation generate exception and default handler
silently ignore it? For sure writing without protection from different
threads to the same memory location is not good idea :-)
The above tests where made with -gc3 which gives a little bit better
results then -gc2. I rebuild Harbour with HB_GUI macro to disable --uiPolls
and modified your speedtst.prg to optionally run in one thread.
Then I repeat tests with -gc2 to compare the speed difference:

ST mode
          0 - total application time:                              26.67 
              total real time:                                     26.74 

MT mode one thread:
          0 - total application time:                              31.76 
              total real time:                                     31.89 

MT mode two threads:
          0 - total application time:                              38.14 
              total real time:                                     19.64 

Anyhow this tests still uses statically divided jobs.
To make it more aggressive I rewrote speedtst.prg so now it can
be executed in many threads. I'm attaching it.
When you run mt version of speedtst.prg with parameter then each
test loop will be executed in separated thread. It cause really heavy
overhead and it's also quite good MT stress test.
I'm attaching this code. Please try. I'll replace tests/speedtst.prg
with this test ASAP.
I also adopted this code for xHarbour and wanted to run it to compare
results but in practice it's impossible.
It GPFs, corrupt memory, hangups or make other strange things in real
multi CPU machine but never finished correctly. Maybe someone can run
it with single CPU computer.

Here are results (all tests compiled with -gc2 and Harbour compiled
without -DHB_GUI)

ST mode:
   10/03/08 13:14:01 Linux 2.6.25.11-0.1-pae i686
   Harbour 1.1.0dev (Rev. 9535)
   GNU C 4.3.1 (32 bit)
   [...]
   [ total application time: ]....................................28.79
   [ total real time: ]...........................................28.83

MT mode one thread:
   10/03/08 13:14:35 Linux 2.6.25.11-0.1-pae i686
   Harbour 1.1.0dev (Rev. 9535) (MT)
   [...]
   [ total application time: ]....................................34.87
   [ total real time: ]...........................................34.89


MT mode many threads threads:
   10/03/08 13:15:16 Linux 2.6.25.11-0.1-pae i686
   Harbour 1.1.0dev (Rev. 9535) (MT)
   [...]
   [ total application time: ]....................................44.50
   [ total real time: ]...........................................15.53

I have three CPU machine and the final real time when test is divided
into many threads is ~ 2.25 times smaller then executed by single
threads. Such results are more acceptable because in the most heavy
tests new complex items inspected by GC are created and destroyed
what needs some serialization in registering them. This, cost of the
task switching and some differences in each test time (the total result
is reduced to the cost of most expensive test which is finished on single
CPU when other are not used -  it's not harbour problem at all but it
makes total results worse then possible) does not allow to reach linear
performance improvement (3.00) but IMHO gives reasonable for me performance
and scalability. I'm interesting in results from other machines.

best regards,
Przemek
#define N_TESTS 51
#define N_LOOPS 1000000
#define ARR_LEN 16

#ifndef EOL
    #define EOL hb_OSNewLine()
#endif
#command ? => outstd(EOL)
#command ? <xx,...> => outstd(EOL);outstd(<xx>)


#xcommand _( [<cmds,...>] ) => [<cmds>]

#xcommand TEST <testfunc>           ;
          [ WITH <locals,...> ]     ;
          [ INIT <init> ]           ;
          [ EXIT <exit> ]           ;
          [ INFO <info> ]           ;
          CODE [<*testExp*>] =>     ;
   func <testfunc>                  ;;
      local time, i, x := nil       ;;
      [ local <locals> ; ]          ;;
      [ <init> ; ]                  ;;
      time := secondscpu()          ;;
      for i:=1 to N_LOOPS           ;;
         <testExp>                  ;;
      next                          ;;
      time := secondscpu() - time   ;;
      [ <exit> ; ]                  ;;
   return { iif( <.info.>, <(info)>, #<testExp> ), time }

static S_C, S_N, S_D

#ifdef __XHARBOUR__
   /* do not expect that this code will work with xHarbour.
    * xHarbour has many race conditions which are exploited quite fast
    * on real multi CPU machines so it crashes in different places :-(
    * probably this code should be forwared to xHarbour developers as
    * some type of MT test
    */
   static s_aResults[ N_TESTS + 1 ]
   #xtranslate hb_mtvm()                  => hb_multiThread()

   /* I used function wrappers to simulate thread join which can
    * return thread results
    */
   static function do_test( cFunc )
      local x
      ? "starting: " + cFunc + "()"
      // if you set .f. then tests will be skipped but you can check
      // if this test code is executed because it greatly reduces
      // the race conditions inside xHarbour HVM
      if .t.
         x := &cFunc()
      else
         x := { "skipped test " + cFunc, val( substr( cFunc, 2 ) ) + 0.99 }
      endif
      s_aResults[ val( substr( cFunc, 2 ) ) ] := x
   return nil

   function hb_threadStart( cFunc )
   return StartThread( @do_test(), cFunc )

   function hb_threadJoin( thId, xResult )
      static s_n := 0
      local lOK
      /* in xHarbour there is race condition in JoinThread() which
       * fails if thread end before we call it so we cannot use it :-(
       */
      //lOK := JoinThread( thId )
      lOK := .t.
      if s_n == 0
         WaitForThreads()
      endif
      xResult := s_aResults[ ++s_n ]
   return lOK
#endif


TEST t000 INFO "empty loop overhead" CODE

TEST t001 WITH L_C:=dtos(date()) CODE x := L_C

TEST t002 WITH L_N:=112345.67    CODE x := L_N

TEST t003 WITH L_D:=date()       CODE x := L_D

TEST t004 INIT S_C:=dtos(date()) CODE x := S_C

TEST t005 INIT S_N:=112345.67    CODE x := S_N

TEST t006 INIT S_D:=date()       CODE x := S_D

TEST t007 INIT _( memvar M_C ) INIT _( private M_C:=dtos(date()) ) ;
          CODE x := M_C

TEST t008 INIT _( memvar M_N ) INIT _( private M_N:=112345.67 ) ;
          CODE x := M_N

TEST t009 INIT _( memvar M_D ) INIT _( private M_D:=date() ) ;
          CODE x := M_D

TEST t010 INIT _( memvar P_C ) INIT _( public P_C:=dtos(date()) ) ;
          CODE x := P_C

TEST t011 INIT _( memvar P_N ) INIT _( public P_N:=112345.67 ) ;
          CODE x := P_N

TEST t012 INIT _( memvar P_D ) INIT _( public P_D:=date() ) ;
          CODE x := P_D

TEST t013 INIT _( field F_C ) INIT use_dbsh() EXIT close_db() ;
          CODE x := F_C

TEST t014 INIT _( field F_N ) INIT use_dbsh() EXIT close_db() ;
          CODE x := F_N

TEST t015 INIT _( field F_D ) INIT use_dbsh() EXIT close_db() ;
          CODE x := F_D

TEST t016 WITH o := errorNew() CODE x := o:GenCode

TEST t017 WITH o := errorNew() CODE x := o[8]

TEST t018 CODE round( i / 1000, 2 )

TEST t019 CODE str( i / 1000 )

TEST t020 WITH s := stuff( dtos( date() ), 7, 0, "." ) CODE val( s )

TEST t021 WITH a := afill( array( ARR_LEN ), ;
                           stuff( dtos( date() ), 7, 0, "." ) ) ;
          CODE val( a [ i % ARR_LEN + 1 ] )

TEST t022 WITH d := date() CODE dtos( d - i % 10000 )

TEST t023 CODE eval( { || i % ARR_LEN } )

TEST t024 WITH bc := { || i % ARR_LEN } ;
          INFO eval( bc := { || i % ARR_LEN } ) ;
          CODE eval( bc )

TEST t025 CODE eval( { |x| x % ARR_LEN }, i )

TEST t026 WITH bc := { |x| x % ARR_LEN } ;
          INFO eval( bc := { |x| x % ARR_LEN }, i ) ;
          CODE eval( bc, i )

TEST t027 CODE eval( { |x| f1( x ) }, i )

TEST t028 WITH bc := { |x| f1( x ) } ;
          INFO eval( bc := { |x| f1( x ) }, i ) ;
          CODE eval( bc, i )

TEST t029 CODE x := &( "f1(" + str(i) + ")" )

TEST t030 WITH bc CODE bc := &( "{|x|f1(x)}" ); eval( bc, i )

TEST t031 CODE x := valtype( x ) +  valtype( i )

TEST t032 WITH a := afill( array( ARR_LEN ), ;
                           stuff( dtos( date() ), 7, 0, "." ) ) ;
          CODE x := strzero( i % 100, 2 ) $ a[ i % ARR_LEN + 1 ]

TEST t033 WITH a := array( ARR_LEN ), s := dtos( date() ) ;
          INIT aeval( a, { |x,i| a[i] := left( s + s, i ) } ) ;
          CODE x := a[ i % ARR_LEN + 1 ] == s

TEST t034 WITH a := array( ARR_LEN ), s := dtos( date() ) ;
          INIT aeval( a, { |x,i| a[i] := left( s + s, i ) } ) ;
          CODE x := a[ i % ARR_LEN + 1 ] = s

TEST t035 WITH a := array( ARR_LEN ), s := dtos( date() ) ;
          INIT aeval( a, { |x,i| a[i] := left( s + s, i ) } ) ;
          CODE x := a[ i % ARR_LEN + 1 ] >= s

TEST t036 WITH a := array( ARR_LEN ), s := dtos( date() ) ;
          INIT aeval( a, { |x,i| a[i] := left( s + s, i ) } ) ;
          CODE x := a[ i % ARR_LEN + 1 ] <= s

TEST t037 WITH a := array( ARR_LEN ), s := dtos( date() ) ;
          INIT aeval( a, { |x,i| a[i] := left( s + s, i ) } ) ;
          CODE x := a[ i % ARR_LEN + 1 ] < s

TEST t038 WITH a := array( ARR_LEN ), s := dtos( date() ) ;
          INIT aeval( a, { |x,i| a[i] := left( s + s, i ) } ) ;
          CODE x := a[ i % ARR_LEN + 1 ] > s

TEST t039 WITH a := array( ARR_LEN ) ;
          INIT aeval( a, { |x,i| a[i] := i } ) ;
          CODE ascan( a, i % ARR_LEN )

TEST t040 WITH a := array( ARR_LEN ) ;
          INIT aeval( a, { |x,i| a[i] := i } ) ;
          CODE ascan( a, { |x| x == i % ARR_LEN } )

TEST t041 WITH a := {}, a2 := { 1, 2, 3 }, bc := { |x| f1(x) }, ;
               s := dtos( date() ), s2 := "static text" ;
     CODE if i%4000==0;a:={};end; aadd(a,{i,1,.t.,s,s2,a2,bc})

TEST t042 CODE f0()

TEST t043 CODE f1( i )

TEST t044 WITH c := dtos( date() ) ;
          INFO f2( c[1...8] ) ;
          CODE f2( c )

TEST t045 WITH c := repl( dtos( date() ), 5000 ) ;
          INFO f2( c[1...40000] ) ;
          CODE f2( c )

TEST t046 WITH c := repl( dtos( date() ), 5000 ) ;
          INFO f2( @c[1...40000] ) ;
          CODE f2( c )

TEST t047 WITH c := repl( dtos( date() ),5000 ), c2 ;
          INFO "f2( @c[1...40000] ), c2 := c" ;
          CODE f2( @c ); c2 := c

TEST t048 WITH a := {}, a2 := { 1, 2, 3 }, bc := { |x| f1(x) }, ;
               s := dtos( date() ), s2 := "static text", n := 1.23 ;
     CODE f3( a, a2, s, i, s2, bc, i, n, x )

TEST t049 WITH a := { 1, 2, 3 } CODE f2( a )

TEST t050 CODE x := f4()

TEST t051 CODE x := f5()


proc main(mt)
local nLoopOverHead, nTimes, nSeconds, x, i, aThreads:={}

create_db()

#ifdef __HARBOUR__
#include "hbmemory.ch"
if MEMORY( HB_MEM_USEDMAX ) != 0
   ? "Warning !!! Memory statistic enabled."
   ?
endif
#endif

? "Startup loop to increase CPU clock..."
x := seconds() + 5; while x > seconds(); enddo

? date(), time(), os()
? version() + iif( hb_mtvm(), " (MT)", "" )
? hb_compiler()
? "N_LOOPS =", N_LOOPS

x :=t000()
? dsp_result( x, 0 )
nLoopOverHead := x[2]

? replicate("=",68)

nSeconds := seconds()
nTimes := secondsCPU()

if !empty(mt) .and. hb_mtvm()
   aThreads := array( N_TESTS )
   for i:=1 to N_TESTS
      aThreads[ i ] := hb_threadStart( "t" + strzero( i, 3 ) )
   next
   for i:=1 to N_TESTS
      if hb_threadJoin( aThreads[ i ], @x )
         ? dsp_result( x, nLoopOverHead )
      endif
   next
else
   for i:=1 to N_TESTS
      ? dsp_result( &( "t" + strzero( i, 3 ) )(), nLoopOverHead )
   next
endif

nTimes := secondsCPU() - nTimes
nSeconds := seconds() - nSeconds

? replicate("=",68)
? dsp_result( { "total application time:", nTimes }, 0)
? dsp_result( { "total real time:", nSeconds }, 0 )
?

remove_db()
return


function f0()
return nil

function f1(x)
return x

function f2(x)
return nil

function f3(a,b,c,d,e,f,g,h,i)
return nil

function f4()
return space(4000)

function f5()
return space(5)


static func dsp_result( aResult, nLoopOverHead )
   return padr( "[ " + left( aResult[1], 56 ) + " ]", 60, "." ) + ;
          strtran( str( max( aResult[2] - nLoopOverHead, 0 ), 8, 2 ), " ", "." )


#define TMP_FILE "_tst_tmp.dbf"
static proc create_db()
   remove_db()
   dbcreate( TMP_FILE, { {"F_C", "C", 10, 0},;
                         {"F_N", "N", 10, 2},;
                         {"F_D", "D",  8, 0} } )
   use TMP_FILE exclusive
   dbappend()
   replace F_C with dtos(date())
   replace F_N with 112345.67
   replace F_D with date()
   close
return

static proc remove_db()
   ferase( TMP_FILE )
return

static proc close_db()
   close
return

static proc use_dbsh()
   use TMP_FILE shared
return
_______________________________________________
Harbour mailing list
Harbour@harbour-project.org
http://lists.harbour-project.org/mailman/listinfo/harbour

Reply via email to