Hi Paul,
I wrote a small benchmark tool that compares various memcpy routines under
a number of conditions.
The benchmark test a number of different block sizes from a few Bytes to 16
MB.
Each test is repeated in a loop several times.
Three main types of copies are benchmarked.
a) Copies that are not cache able. This is a real memory-to-memory copy.
To prevent the CPU from caching the data, the SRC and DST pointers are
moved during the loop iterations.
b) Copies where the CPU is allowed to cache the SRC. This is a
cache-to-memory copy.
To prevent the CPU from caching the data, the DST pointer is moved during
the loop iterations.
C) Copies where the CPU is allowed to cache both SRC and DST. This is a
cache-to-cache copy.
To allow the CPU to cache it, both pointers are constant during the loop
iterations.
The test are repeated with different source/dst alignments to show
performance difference for aligned or not aligned data.
I've tested and compared various copy routines both on PS3, JS21 and QS21
and QS22.
Please find some results of the PS3 attached.
The test clearly show that the old GLIBC and Linux memcpy routines have the
same speed on CELL.
For aligned data:
Linux and GLIBC both got result speed around 1500 MB/sec.
The Linux routine has an exception for the 4K case and gets around 3200
MB/sec for 4K copies
Our patch always gets a result between 5500-6000 MB/sec
For unaligned data both Linux and glibc score low with 800 MB/sec
Our patch gets here around 2500 MB/sec
If you want then I can send you the source of my benchmark program.
Cheers
Gunnar
(See attached file: ps3_result_easy_toread.txt)
Paul Mackerras
<[EMAIL PROTECTED]
> To
Gunnar von
20/06/2008 01:33 Boehn/Germany/Contr/[EMAIL PROTECTED]
cc
Arnd Bergmann <[EMAIL PROTECTED]>, Mark
Nelson <[EMAIL PROTECTED]>,
[email protected], Michael
Ellerman <[EMAIL PROTECTED]>,
[EMAIL PROTECTED]
Subject
Re: [RFC 0/3] powerpc: memory copy
routines tweaked for Cell
Gunnar von Boehn writes:
> I have no results for P5/P6, but I did some tests on JS21 aka PPC-970.
> On PPC-970 the CELL memcpy is faster than the current Linux routine.
> This becomes really visible when you really copy memory-to-memory and are
> not only working in the 2ndlevelcache.
Could you send some more details, like the actual copy speed you
measured and how you did the tests?
Thanks,
Paul.
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark memcpy performance v1.90
------------------------------------------------------------------------------------------------------------------------------------------------------------------
The Test will run some time please be patient.
Total memory required = 33.6 MB.
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Memory throughput Working on Arrays of 16.8 MB.
We are now comparing different memcpy routines
Results are in MB/sec. Higher value means faster.
The test will be repeated on different aligned data.
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Best Case - Copy Memory-to-Memory
Alignment 0-0 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B
256B 150B 128B 100B 64B 32B 16B
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy 1563 1582 1601 1600 1599 1489 1320 1099 997
833 443 553 428 280 142 71
linux 64 1544 1561 1535 1542 3274 1452 1291 1093 1006
839 508 555 451 296 149 75
CELL memcpy 5869 6016 5454 5346 5607 4355 3523 2030 1648
1131 670 600 413 294 149 75
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Best Case - Copy Cache-to-Memory
Alignment 0-0 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B
256B 150B 128B 100B 64B 32B 16B
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy 1625 2225 6104 6569 7013 6778 5967 5777 5101
3928 2429 2224 1490 1121 564 284
linux 64 1566 2110 5332 6372 3264 6394 5858 5286 4539
3574 2434 2142 1572 1135 574 288
CELL memcpy 5683 7763 11002 10843 10306 9018 8352 6595 5805
4629 2572 2300 1595 1154 577 287
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Best Case - Copy Cache-to-Cache
Alignment 0-0 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B
256B 150B 128B 100B 64B 32B 16B
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy 1627 1982 6852 14928 14114 13128 11361 9291 8217
6470 5023 4808 3847 2882 1552 1014
linux 64 1565 1878 5907 10565 4120 9874 9141 7993 7373
6334 4885 4344 4389 3619 2159 1227
CELL memcpy 5652 7796 15277 18296 17374 16628 14332 11234 10468
9550 6982 8324 5456 4084 2703 1547
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Good Case - aligned on 4 but crosses 4k page - Copy Memory-to-Memory
Alignment 0-4092 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B
256B 150B 128B 100B 64B 32B 16B
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy 1606 1620 1599 1602 1591 1510 1354 1103 992
823 497 470 426 278 140 72
linux 64 1558 1576 1559 1550 1521 1427 1242 1013 900
745 500 454 450 295 150 75
CELL memcpy 5991 6042 5907 5660 4794 3660 2687 1837 1451
1039 636 556 438 290 148 73
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Good Case - aligned on 4 but crosses 4k page - Copy Cache-to-Memory
Alignment 0-4092 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B
256B 150B 128B 100B 64B 32B 16B
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy 1618 2128 4568 7531 8013 7122 6163 4845 3944
2860 2322 1737 1400 1109 559 282
linux 64 1560 2038 4422 6604 6551 5659 4700 4273 3814
2765 2306 1667 1437 1127 567 286
CELL memcpy 5628 7747 10750 10715 10038 8006 5955 5176 4431
3438 2431 1881 1343 1128 570 282
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Misaligned Cases - Copy Memory-to-Memory
Alignment 7-0 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B
256B 150B 128B 100B 64B 32B 16B
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy 823 829 823 823 816 778 713 620 568
489 385 342 343 240 130 67
linux 64 861 875 864 859 861 814 753 654 601
518 371 365 358 253 138 73
CELL memcpy 2551 2543 2512 2531 2426 2132 1756 1240 1089
839 540 500 402 272 121 75
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Alignment 0-7 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B
256B 150B 128B 100B 64B 32B 16B
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy 825 830 823 819 815 788 742 666 621
546 356 407 340 240 131 60
linux 64 857 868 854 856 852 822 777 689 647
570 396 418 350 242 132 70
CELL memcpy 2651 2626 2641 2540 2372 2071 1492 985 838
584 404 340 346 243 127 64
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Alignment 17-11 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B
256B 150B 128B 100B 64B 32B 16B
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy 823 829 823 823 816 771 711 614 564
483 379 339 345 240 125 69
linux 64 853 867 853 851 851 803 739 638 584
499 395 351 351 239 130 69
CELL memcpy 2557 2542 2507 2515 2387 1863 1487 998 829
620 435 377 390 241 138 71
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Misaligned Cases - Copy Cache-to-Memory
Alignment 7-0 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B
256B 150B 128B 100B 64B 32B 16B
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy 824 962 1340 1317 1321 1288 1239 1160 1120
1048 909 870 840 694 445 281
linux 64 860 1010 1404 1432 1435 1402 1352 1247 1215
1134 963 1006 880 763 434 279
CELL memcpy 2519 2656 2681 2613 2566 2446 2284 2100 2013
1846 1631 1500 1343 1059 561 287
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Alignment 0-7 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B
256B 150B 128B 100B 64B 32B 16B
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy 825 949 1268 1324 1319 1285 1244 1147 1107
1039 847 882 834 696 425 218
linux 64 854 1007 1406 1419 1430 1391 1319 1209 1163
1082 938 923 822 695 371 192
CELL memcpy 2598 2731 2729 2726 2558 2372 2126 1757 1633
1343 1057 921 850 739 374 242
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Alignment 17-11 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B
256B 150B 128B 100B 64B 32B 16B
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy 824 946 1251 1321 1325 1291 1238 1143 1106
1038 898 868 861 689 364 193
linux 64 857 999 1391 1426 1421 1379 1316 1201 1159
1053 868 886 835 661 422 251
CELL memcpy 2519 2657 2641 2605 2537 2389 2211 1962 1843
1554 1460 1302 1181 772 375 278
------------------------------------------------------------------------------------------------------------------------------------------------------------------
_______________________________________________
Linuxppc-dev mailing list
[email protected]
https://ozlabs.org/mailman/listinfo/linuxppc-dev