Hi Paul,

I wrote a small benchmark tool that compares various memcpy routines under
a number of conditions.
The benchmark test a number of different block sizes from a few Bytes to 16
MB.
Each test is repeated in a loop several times.

Three main types of copies are benchmarked.
a) Copies that are not cache able. This is a real memory-to-memory copy.
To prevent the CPU from caching the data, the SRC and DST pointers are
moved during the loop iterations.
b) Copies where the CPU is allowed to cache the SRC. This is a
cache-to-memory copy.
To prevent the CPU from caching the data, the DST pointer is moved during
the loop iterations.
C) Copies where the CPU is allowed to cache both SRC and DST. This is a
cache-to-cache copy.
To allow the CPU to cache it, both pointers are constant during the loop
iterations.

The test are repeated with different source/dst alignments to show
performance difference for aligned or not aligned data.

I've tested and compared various copy routines both on PS3, JS21 and QS21
and QS22.

Please find some results of the PS3 attached.

The test clearly show that the old GLIBC and Linux memcpy routines have the
same speed on CELL.
For aligned data:
Linux and GLIBC both got result speed  around 1500 MB/sec.
The Linux routine has an exception for the 4K case and gets around 3200
MB/sec for 4K copies
Our patch always gets a result between 5500-6000 MB/sec

For unaligned data both Linux and glibc score low with 800 MB/sec
Our patch gets here around 2500 MB/sec


If you want then I can send you the source of my benchmark program.

Cheers
Gunnar

(See attached file: ps3_result_easy_toread.txt)




                                                                           
             Paul Mackerras                                                
             <[EMAIL PROTECTED]                                             
             >                                                          To 
                                       Gunnar von                          
             20/06/2008 01:33          Boehn/Germany/Contr/[EMAIL PROTECTED]    
   
                                                                        cc 
                                       Arnd Bergmann <[EMAIL PROTECTED]>, Mark 
                                       Nelson <[EMAIL PROTECTED]>,         
                                       linuxppc-dev@ozlabs.org, Michael    
                                       Ellerman <[EMAIL PROTECTED]>,    
                                       [EMAIL PROTECTED]              
                                                                   Subject 
                                       Re: [RFC 0/3] powerpc: memory copy  
                                       routines tweaked for Cell           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




Gunnar von Boehn writes:

> I have no results for P5/P6, but I did some tests on JS21 aka PPC-970.
> On PPC-970 the CELL memcpy is faster than the current Linux routine.
> This becomes really visible when you really copy memory-to-memory and are
> not only working in the 2ndlevelcache.

Could you send some more details, like the actual copy speed you
measured and how you did the tests?

Thanks,
Paul.
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark memcpy performance v1.90
------------------------------------------------------------------------------------------------------------------------------------------------------------------
The Test will run some time please be patient.
Total memory required = 33.6 MB.
------------------------------------------------------------------------------------------------------------------------------------------------------------------

Memory throughput Working on Arrays of 16.8 MB.
We are now comparing different memcpy routines
Results are in MB/sec. Higher value means faster.
The test will be repeated on different aligned data.
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Best Case - Copy Memory-to-Memory
Alignment 0-0     16MB  300KB   64KB   16KB    4KB    2KB    1KB   512B   384B  
 256B   150B   128B   100B    64B    32B    16B 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy      1563   1582   1601   1600   1599   1489   1320   1099    997  
  833    443    553    428    280    142     71 
linux 64          1544   1561   1535   1542   3274   1452   1291   1093   1006  
  839    508    555    451    296    149     75 
CELL memcpy       5869   6016   5454   5346   5607   4355   3523   2030   1648  
 1131    670    600    413    294    149     75 

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Best Case - Copy Cache-to-Memory
Alignment 0-0     16MB  300KB   64KB   16KB    4KB    2KB    1KB   512B   384B  
 256B   150B   128B   100B    64B    32B    16B 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy      1625   2225   6104   6569   7013   6778   5967   5777   5101  
 3928   2429   2224   1490   1121    564    284 
linux 64          1566   2110   5332   6372   3264   6394   5858   5286   4539  
 3574   2434   2142   1572   1135    574    288 
CELL memcpy       5683   7763  11002  10843  10306   9018   8352   6595   5805  
 4629   2572   2300   1595   1154    577    287 

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Best Case - Copy Cache-to-Cache
Alignment 0-0     16MB  300KB   64KB   16KB    4KB    2KB    1KB   512B   384B  
 256B   150B   128B   100B    64B    32B    16B 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy      1627   1982   6852  14928  14114  13128  11361   9291   8217  
 6470   5023   4808   3847   2882   1552   1014 
linux 64          1565   1878   5907  10565   4120   9874   9141   7993   7373  
 6334   4885   4344   4389   3619   2159   1227 
CELL memcpy       5652   7796  15277  18296  17374  16628  14332  11234  10468  
 9550   6982   8324   5456   4084   2703   1547 

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Good Case - aligned on 4 but crosses 4k page - Copy Memory-to-Memory
Alignment 0-4092  16MB  300KB   64KB   16KB    4KB    2KB    1KB   512B   384B  
 256B   150B   128B   100B    64B    32B    16B 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy      1606   1620   1599   1602   1591   1510   1354   1103    992  
  823    497    470    426    278    140     72 
linux 64          1558   1576   1559   1550   1521   1427   1242   1013    900  
  745    500    454    450    295    150     75 
CELL memcpy       5991   6042   5907   5660   4794   3660   2687   1837   1451  
 1039    636    556    438    290    148     73 

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Good Case - aligned on 4 but crosses 4k page - Copy Cache-to-Memory
Alignment 0-4092  16MB  300KB   64KB   16KB    4KB    2KB    1KB   512B   384B  
 256B   150B   128B   100B    64B    32B    16B 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy      1618   2128   4568   7531   8013   7122   6163   4845   3944  
 2860   2322   1737   1400   1109    559    282 
linux 64          1560   2038   4422   6604   6551   5659   4700   4273   3814  
 2765   2306   1667   1437   1127    567    286 
CELL memcpy       5628   7747  10750  10715  10038   8006   5955   5176   4431  
 3438   2431   1881   1343   1128    570    282 

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Misaligned Cases - Copy Memory-to-Memory
Alignment 7-0     16MB  300KB   64KB   16KB    4KB    2KB    1KB   512B   384B  
 256B   150B   128B   100B    64B    32B    16B 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy       823    829    823    823    816    778    713    620    568  
  489    385    342    343    240    130     67 
linux 64           861    875    864    859    861    814    753    654    601  
  518    371    365    358    253    138     73 
CELL memcpy       2551   2543   2512   2531   2426   2132   1756   1240   1089  
  839    540    500    402    272    121     75 

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Alignment 0-7     16MB  300KB   64KB   16KB    4KB    2KB    1KB   512B   384B  
 256B   150B   128B   100B    64B    32B    16B 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy       825    830    823    819    815    788    742    666    621  
  546    356    407    340    240    131     60 
linux 64           857    868    854    856    852    822    777    689    647  
  570    396    418    350    242    132     70 
CELL memcpy       2651   2626   2641   2540   2372   2071   1492    985    838  
  584    404    340    346    243    127     64 

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Alignment 17-11   16MB  300KB   64KB   16KB    4KB    2KB    1KB   512B   384B  
 256B   150B   128B   100B    64B    32B    16B 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy       823    829    823    823    816    771    711    614    564  
  483    379    339    345    240    125     69 
linux 64           853    867    853    851    851    803    739    638    584  
  499    395    351    351    239    130     69 
CELL memcpy       2557   2542   2507   2515   2387   1863   1487    998    829  
  620    435    377    390    241    138     71 

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Misaligned Cases - Copy Cache-to-Memory
Alignment 7-0     16MB  300KB   64KB   16KB    4KB    2KB    1KB   512B   384B  
 256B   150B   128B   100B    64B    32B    16B 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy       824    962   1340   1317   1321   1288   1239   1160   1120  
 1048    909    870    840    694    445    281 
linux 64           860   1010   1404   1432   1435   1402   1352   1247   1215  
 1134    963   1006    880    763    434    279 
CELL memcpy       2519   2656   2681   2613   2566   2446   2284   2100   2013  
 1846   1631   1500   1343   1059    561    287 

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Alignment 0-7     16MB  300KB   64KB   16KB    4KB    2KB    1KB   512B   384B  
 256B   150B   128B   100B    64B    32B    16B 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy       825    949   1268   1324   1319   1285   1244   1147   1107  
 1039    847    882    834    696    425    218 
linux 64           854   1007   1406   1419   1430   1391   1319   1209   1163  
 1082    938    923    822    695    371    192 
CELL memcpy       2598   2731   2729   2726   2558   2372   2126   1757   1633  
 1343   1057    921    850    739    374    242 

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Alignment 17-11   16MB  300KB   64KB   16KB    4KB    2KB    1KB   512B   384B  
 256B   150B   128B   100B    64B    32B    16B 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy       824    946   1251   1321   1325   1291   1238   1143   1106  
 1038    898    868    861    689    364    193 
linux 64           857    999   1391   1426   1421   1379   1316   1201   1159  
 1053    868    886    835    661    422    251 
CELL memcpy       2519   2657   2641   2605   2537   2389   2211   1962   1843  
 1554   1460   1302   1181    772    375    278 

------------------------------------------------------------------------------------------------------------------------------------------------------------------
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Reply via email to