Hi Paul, I wrote a small benchmark tool that compares various memcpy routines under a number of conditions. The benchmark test a number of different block sizes from a few Bytes to 16 MB. Each test is repeated in a loop several times.
Three main types of copies are benchmarked. a) Copies that are not cache able. This is a real memory-to-memory copy. To prevent the CPU from caching the data, the SRC and DST pointers are moved during the loop iterations. b) Copies where the CPU is allowed to cache the SRC. This is a cache-to-memory copy. To prevent the CPU from caching the data, the DST pointer is moved during the loop iterations. C) Copies where the CPU is allowed to cache both SRC and DST. This is a cache-to-cache copy. To allow the CPU to cache it, both pointers are constant during the loop iterations. The test are repeated with different source/dst alignments to show performance difference for aligned or not aligned data. I've tested and compared various copy routines both on PS3, JS21 and QS21 and QS22. Please find some results of the PS3 attached. The test clearly show that the old GLIBC and Linux memcpy routines have the same speed on CELL. For aligned data: Linux and GLIBC both got result speed around 1500 MB/sec. The Linux routine has an exception for the 4K case and gets around 3200 MB/sec for 4K copies Our patch always gets a result between 5500-6000 MB/sec For unaligned data both Linux and glibc score low with 800 MB/sec Our patch gets here around 2500 MB/sec If you want then I can send you the source of my benchmark program. Cheers Gunnar (See attached file: ps3_result_easy_toread.txt) Paul Mackerras <[EMAIL PROTECTED] > To Gunnar von 20/06/2008 01:33 Boehn/Germany/Contr/[EMAIL PROTECTED] cc Arnd Bergmann <[EMAIL PROTECTED]>, Mark Nelson <[EMAIL PROTECTED]>, linuxppc-dev@ozlabs.org, Michael Ellerman <[EMAIL PROTECTED]>, [EMAIL PROTECTED] Subject Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell Gunnar von Boehn writes: > I have no results for P5/P6, but I did some tests on JS21 aka PPC-970. > On PPC-970 the CELL memcpy is faster than the current Linux routine. > This becomes really visible when you really copy memory-to-memory and are > not only working in the 2ndlevelcache. Could you send some more details, like the actual copy speed you measured and how you did the tests? Thanks, Paul.
------------------------------------------------------------------------------------------------------------------------------------------------------------------ Benchmark memcpy performance v1.90 ------------------------------------------------------------------------------------------------------------------------------------------------------------------ The Test will run some time please be patient. Total memory required = 33.6 MB. ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Memory throughput Working on Arrays of 16.8 MB. We are now comparing different memcpy routines Results are in MB/sec. Higher value means faster. The test will be repeated on different aligned data. ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Best Case - Copy Memory-to-Memory Alignment 0-0 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B 256B 150B 128B 100B 64B 32B 16B ------------------------------------------------------------------------------------------------------------------------------------------------------------------ glibc memcpy 1563 1582 1601 1600 1599 1489 1320 1099 997 833 443 553 428 280 142 71 linux 64 1544 1561 1535 1542 3274 1452 1291 1093 1006 839 508 555 451 296 149 75 CELL memcpy 5869 6016 5454 5346 5607 4355 3523 2030 1648 1131 670 600 413 294 149 75 ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Best Case - Copy Cache-to-Memory Alignment 0-0 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B 256B 150B 128B 100B 64B 32B 16B ------------------------------------------------------------------------------------------------------------------------------------------------------------------ glibc memcpy 1625 2225 6104 6569 7013 6778 5967 5777 5101 3928 2429 2224 1490 1121 564 284 linux 64 1566 2110 5332 6372 3264 6394 5858 5286 4539 3574 2434 2142 1572 1135 574 288 CELL memcpy 5683 7763 11002 10843 10306 9018 8352 6595 5805 4629 2572 2300 1595 1154 577 287 ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Best Case - Copy Cache-to-Cache Alignment 0-0 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B 256B 150B 128B 100B 64B 32B 16B ------------------------------------------------------------------------------------------------------------------------------------------------------------------ glibc memcpy 1627 1982 6852 14928 14114 13128 11361 9291 8217 6470 5023 4808 3847 2882 1552 1014 linux 64 1565 1878 5907 10565 4120 9874 9141 7993 7373 6334 4885 4344 4389 3619 2159 1227 CELL memcpy 5652 7796 15277 18296 17374 16628 14332 11234 10468 9550 6982 8324 5456 4084 2703 1547 ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Good Case - aligned on 4 but crosses 4k page - Copy Memory-to-Memory Alignment 0-4092 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B 256B 150B 128B 100B 64B 32B 16B ------------------------------------------------------------------------------------------------------------------------------------------------------------------ glibc memcpy 1606 1620 1599 1602 1591 1510 1354 1103 992 823 497 470 426 278 140 72 linux 64 1558 1576 1559 1550 1521 1427 1242 1013 900 745 500 454 450 295 150 75 CELL memcpy 5991 6042 5907 5660 4794 3660 2687 1837 1451 1039 636 556 438 290 148 73 ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Good Case - aligned on 4 but crosses 4k page - Copy Cache-to-Memory Alignment 0-4092 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B 256B 150B 128B 100B 64B 32B 16B ------------------------------------------------------------------------------------------------------------------------------------------------------------------ glibc memcpy 1618 2128 4568 7531 8013 7122 6163 4845 3944 2860 2322 1737 1400 1109 559 282 linux 64 1560 2038 4422 6604 6551 5659 4700 4273 3814 2765 2306 1667 1437 1127 567 286 CELL memcpy 5628 7747 10750 10715 10038 8006 5955 5176 4431 3438 2431 1881 1343 1128 570 282 ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Misaligned Cases - Copy Memory-to-Memory Alignment 7-0 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B 256B 150B 128B 100B 64B 32B 16B ------------------------------------------------------------------------------------------------------------------------------------------------------------------ glibc memcpy 823 829 823 823 816 778 713 620 568 489 385 342 343 240 130 67 linux 64 861 875 864 859 861 814 753 654 601 518 371 365 358 253 138 73 CELL memcpy 2551 2543 2512 2531 2426 2132 1756 1240 1089 839 540 500 402 272 121 75 ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Alignment 0-7 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B 256B 150B 128B 100B 64B 32B 16B ------------------------------------------------------------------------------------------------------------------------------------------------------------------ glibc memcpy 825 830 823 819 815 788 742 666 621 546 356 407 340 240 131 60 linux 64 857 868 854 856 852 822 777 689 647 570 396 418 350 242 132 70 CELL memcpy 2651 2626 2641 2540 2372 2071 1492 985 838 584 404 340 346 243 127 64 ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Alignment 17-11 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B 256B 150B 128B 100B 64B 32B 16B ------------------------------------------------------------------------------------------------------------------------------------------------------------------ glibc memcpy 823 829 823 823 816 771 711 614 564 483 379 339 345 240 125 69 linux 64 853 867 853 851 851 803 739 638 584 499 395 351 351 239 130 69 CELL memcpy 2557 2542 2507 2515 2387 1863 1487 998 829 620 435 377 390 241 138 71 ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Misaligned Cases - Copy Cache-to-Memory Alignment 7-0 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B 256B 150B 128B 100B 64B 32B 16B ------------------------------------------------------------------------------------------------------------------------------------------------------------------ glibc memcpy 824 962 1340 1317 1321 1288 1239 1160 1120 1048 909 870 840 694 445 281 linux 64 860 1010 1404 1432 1435 1402 1352 1247 1215 1134 963 1006 880 763 434 279 CELL memcpy 2519 2656 2681 2613 2566 2446 2284 2100 2013 1846 1631 1500 1343 1059 561 287 ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Alignment 0-7 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B 256B 150B 128B 100B 64B 32B 16B ------------------------------------------------------------------------------------------------------------------------------------------------------------------ glibc memcpy 825 949 1268 1324 1319 1285 1244 1147 1107 1039 847 882 834 696 425 218 linux 64 854 1007 1406 1419 1430 1391 1319 1209 1163 1082 938 923 822 695 371 192 CELL memcpy 2598 2731 2729 2726 2558 2372 2126 1757 1633 1343 1057 921 850 739 374 242 ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Alignment 17-11 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B 256B 150B 128B 100B 64B 32B 16B ------------------------------------------------------------------------------------------------------------------------------------------------------------------ glibc memcpy 824 946 1251 1321 1325 1291 1238 1143 1106 1038 898 868 861 689 364 193 linux 64 857 999 1391 1426 1421 1379 1316 1201 1159 1053 868 886 835 661 422 251 CELL memcpy 2519 2657 2641 2605 2537 2389 2211 1962 1843 1554 1460 1302 1181 772 375 278 ------------------------------------------------------------------------------------------------------------------------------------------------------------------
_______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev