------- Comment #30 from vvv at ru dot ru 2009-05-14 09:01 ------- Created an attachment (id=17863) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17863&action=view) Testing tool.
Here is results of my testing. Code: align 128 test_cikl: rept 14 ; 14 if SH=0, 15 if SH=1, 16 if SH=2 { nop } cmp al,0 ; 2 bytes jz $+10h+NOPS ; 2 bytes offset=xxxx0 cmp al,1 ; 2 bytes offset=xxxx2 jz $+0Ch+NOPS ; 2 bytes offset=xxxx4 cmp al,2 ; 2 bytes offset=xxxx6 jz $+08h+NOPS ; 2 bytes offset=xxxx8 cmp al,3 ; 2 bytes offset=xxxxA match =1, NOPS { nop } match =2, NOPS { xchg eax,eax ; 2-bytes NOP } jz $+04h ; 2 bytes offset=xxxxC ja $+02h ; 2 bytes offset=xxxxE mov eax,ecx and eax,7h loop test_cikl This code tested on Core2,Xeon and P4 CPU. Results in RDTSC ticks. ; Core 2 Duo ; NOPS/tick/Max NOPS/tick/Max NOPS/tick/Max ; SH=0 0/571/729 1/306/594 2/315/630 ; SH=1 0/338/612 1/338/648 2/339/648 ; SH=2 0/339/666 1/339/675 2/333/693 ; Xeon 3110 ; NOPS/tick/Max NOPS/tick/Max NOPS/tick/Max ; SH=0 0/586/693 1/310/675 2/310/675 ; SH=1 0/333/657 1/330/648 2/464/630 ; SH=2 0/333/657 1/470/594 2/474/603 ; P4 ; NOPS/tick/Max NOPS/tick/Max NOPS/tick/Max ; SH=0 0/1027/1317 1/1094/1258 2/1028/1207 ; SH=1 0/1151/1377 1/1068/1352 2/902/1275 ; SH=2 0/1124/1275 1/1148/1335 2/979/1139 Conclusion: 1. Core2 and Xeon - similar results. P4 - something strange. For Core2 & Xeon padding very effective. Code with padding almoust 2 times faster. No sence for P4? 2. My previous sentence VVV> 1. AMD limitation for 16-bytes page (memory range XXX0 - XXXF),but VVV> Intel limitation for 16-bytes chunk (memory range XXXX - XXXX+10h) is wrong. At leat for Core2 & Xeon. For this CPU "16-bytes chunk" means memory range XXX0 - XXXF. Unfortunately, I can't test AMD. PS. My testing tool in attachmen. It start under MSDOS, switch to 32-bit mode, switch to 64-bit mode and measure rdtsc ticks for test code. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39942