On Wed, Aug 24, 2011 at 12:50 PM, Oleg Smolsky
<oleg.smol...@riverbed.com> wrote:
> On 2011/8/23 11:38, Xinliang David Li wrote:
>>
>> Partial register stall happens when there is a 32bit register read
>> followed by a partial register write. In your case, the stall probably
>> happens in the next iteration when 'add eax, 0Ah' executes, so your
>> manual patch does not work.  Try change
>>
>> add al, [dx] into two instructions (assuming esi is available here)
>>
>> movzx esi, ds:data8[dx]
>> add  eax, esi
>>
> I patched the code to use  "movzx edi" but the result is a little clumsy as
> the loop is based on the virtual address rather than index.

my bad -- I did copy & paste without making it precise.

> Also, the
> sequence is a bit bigger so I had to spill the patch into the preceding
> padding:
>
> .text:0000000000400D80 loc_400D80:
> .text:0000000000400D80                 mov     edx, offset data8
> .text:0000000000400D85                 xor     eax, eax
> .text:0000000000400D87                 nop
> .text:0000000000400D88                 nop
> .text:0000000000400D89                 nop
> .text:0000000000400D8A                 nop
> .text:0000000000400D8B                 nop
> .text:0000000000400D8C
> .text:0000000000400D8C loc_400D8C:
> .text:0000000000400D8C                 movzx   edi, byte ptr [rdx+0]
> .text:0000000000400D90                 add     eax, edi
> .text:0000000000400D92                 add     eax, 0Ah
> .text:0000000000400D95                 add     rdx, 1
> .text:0000000000400D99                 cmp     rdx, 503480h
> .text:0000000000400DA0                 jnz     short loc_400D8C
> .text:0000000000400DA2                 movsx   eax, al
> .text:0000000000400DA5                 add     ecx, 1
> .text:0000000000400DA8                 add     ebx, eax
> .text:0000000000400DAA                 cmp     ecx, esi
> .text:0000000000400DAC                 jnz     short loc_400D80
>
> The performance improved from 2.84 sec (563.38 M ops/s) to 1.51 sec (1059.60
> M ops/s). It's close to the code emitted by g++4.1 now. Very funky!
>
> So, this is one test out of the suite. Many of them degraded... Are you guys
> interested in looking at other ones? Or is there something to be fixed in
> the register allocation logic?

File bugs --- the isolated examples like this one would be very
helpful in the bug report.

Thanks,

David


>
> Oleg.
>

Reply via email to