Przemyslaw,

I was going to try new speedtst, but even if it builds ok, it does not start,
there is no error, no exception, no output.

It simply exits.

Any idea?

Maurilio.

Przemyslaw Czerpak wrote:
> On Fri, 03 Oct 2008, Maurilio Longo wrote:
> 
> Hi Maurilio,
> 
>>> In -gc[0-2] mode each threads makes:
>>>       if( ! --uiPolls )
>>>          hb_inkeyPoll();
>>> in main HVM loop. Simple hack even acceptable in ST programs is killing
>>> the performance in ST ones. Important is not hb_inkeyPoll() call (you can
>>> comment it but it will not change the results) but --uiPolls.
>>> Each thread decrements the same memory variable and it causes some horrible
>>> synchronization issues between CPUs on multi CPU machines which kill the
>>> performance. It's a hardware issue so I do not know how big overhead it
>>> may cause on other machines. I'm very interesting in your results.
>>> Please rebuild Harbour with -DHB_GUI macro or compile speedtst.prg with
>>> -gc3 and compare results.
>>>
>> I did a full rebuild of latest svn code and I've added -gc3 to my modified
>> speedtst.prg and here are the results.
>>  3 - ascan(a,{|x|x==i%ARR_LEN}) ->                        22.03
>>  2 - ascan(a2,c+chr(i%64+64)) ->                           3.72
>> ============================================================
>>  0 - total application time:                              55.27
>> total real time:                                     55.27
>> So it decreased total time of a little less than some 10%
> 
> I have different results.
> With the --uiPolls the speed is reduced to single CPU.
> When I eliminated it the results from modified by you speedtst.prg
> where reduced nearly by half.
> I had to change the size in str(hb_threadID(),2) to 11.
> 
> 10/02/08 20:10:09 Harbour 1.1.0dev (Rev. 9533) (MT), Linux 2.6.25.11-0.1-pae 
> i686 
> [...]
> -1223345264 - f2(a2) ->                                             0.99 
> -1223345264 - ascan(a,i%ARR_LEN) ->                                 1.49 
> -1214952560 - &('f1('+str(i)+')') ->                               11.96 
> -1214952560 - s:=f4() ->                                            3.03 
> -1214952560 - s:=f5() ->                                            2.26 
> -1214952560 - ascan(a2,c+chr(i%64+64)) ->                           5.41 
> -1223345264 - ascan(a,{|x|x==i%ARR_LEN}) ->                        13.29 
> ============================================================ 
>           0 - total application time:                              57.98 
> total real time:                                     29.64 
> 
> 
> 10/02/08 20:11:30 Harbour 1.1.0dev (Rev. 9533) (MT), Linux 2.6.25.11-0.1-pae 
> i686 
> [...]
> -1224356976 - f2(a2) ->                                             0.56 
> -1224356976 - ascan(a,i%ARR_LEN) ->                                 1.10 
> -1215964272 - &('f1('+str(i)+')') ->                                8.85 
> -1215964272 - s:=f4() ->                                            1.83 
> -1215964272 - s:=f5() ->                                            1.34 
> -1215964272 - ascan(a2,c+chr(i%64+64)) ->                           3.67 
> -1224356976 - ascan(a,{|x|x==i%ARR_LEN}) ->                        10.31 
> ============================================================ 
>           0 - total application time:                              31.95 
> total real time:                                     16.51 
> 
> though they where not doubled when you compare then to MT version
> executed with single thread.
> Looks like this is strictly hardware oriented issue. Maybe in my
> machine such situation generate exception and default handler
> silently ignore it? For sure writing without protection from different
> threads to the same memory location is not good idea :-)
> The above tests where made with -gc3 which gives a little bit better
> results then -gc2. I rebuild Harbour with HB_GUI macro to disable --uiPolls
> and modified your speedtst.prg to optionally run in one thread.
> Then I repeat tests with -gc2 to compare the speed difference:
> 
> ST mode
>           0 - total application time:                              26.67 
>               total real time:                                     26.74 
> 
> MT mode one thread:
>           0 - total application time:                              31.76 
>               total real time:                                     31.89 
> 
> MT mode two threads:
>           0 - total application time:                              38.14 
>               total real time:                                     19.64 
> 
> Anyhow this tests still uses statically divided jobs.
> To make it more aggressive I rewrote speedtst.prg so now it can
> be executed in many threads. I'm attaching it.
> When you run mt version of speedtst.prg with parameter then each
> test loop will be executed in separated thread. It cause really heavy
> overhead and it's also quite good MT stress test.
> I'm attaching this code. Please try. I'll replace tests/speedtst.prg
> with this test ASAP.
> I also adopted this code for xHarbour and wanted to run it to compare
> results but in practice it's impossible.
> It GPFs, corrupt memory, hangups or make other strange things in real
> multi CPU machine but never finished correctly. Maybe someone can run
> it with single CPU computer.
> 
> Here are results (all tests compiled with -gc2 and Harbour compiled
> without -DHB_GUI)
> 
> ST mode:
>    10/03/08 13:14:01 Linux 2.6.25.11-0.1-pae i686
>    Harbour 1.1.0dev (Rev. 9535)
>    GNU C 4.3.1 (32 bit)
>    [...]
>    [ total application time: ]....................................28.79
>    [ total real time: ]...........................................28.83
> 
> MT mode one thread:
>    10/03/08 13:14:35 Linux 2.6.25.11-0.1-pae i686
>    Harbour 1.1.0dev (Rev. 9535) (MT)
>    [...]
>    [ total application time: ]....................................34.87
>    [ total real time: ]...........................................34.89
> 
> 
> MT mode many threads threads:
>    10/03/08 13:15:16 Linux 2.6.25.11-0.1-pae i686
>    Harbour 1.1.0dev (Rev. 9535) (MT)
>    [...]
>    [ total application time: ]....................................44.50
>    [ total real time: ]...........................................15.53
> 
> I have three CPU machine and the final real time when test is divided
> into many threads is ~ 2.25 times smaller then executed by single
> threads. Such results are more acceptable because in the most heavy
> tests new complex items inspected by GC are created and destroyed
> what needs some serialization in registering them. This, cost of the
> task switching and some differences in each test time (the total result
> is reduced to the cost of most expensive test which is finished on single
> CPU when other are not used -  it's not harbour problem at all but it
> makes total results worse then possible) does not allow to reach linear
> performance improvement (3.00) but IMHO gives reasonable for me performance
> and scalability. I'm interesting in results from other machines.
> 
> best regards,
> Przemek
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Harbour mailing list
> Harbour@harbour-project.org
> http://lists.harbour-project.org/mailman/listinfo/harbour

-- 
 __________
|  |  | |__| Maurilio Longo
|_|_|_|____| farmaconsult s.r.l.


_______________________________________________
Harbour mailing list
Harbour@harbour-project.org
http://lists.harbour-project.org/mailman/listinfo/harbour

Reply via email to