On Thu, 31 Aug 2000, Petko Manolov wrote:

>       Hi to all,
> 
> I made this patch as some people request using
> 486 optimized string routines for older
> (486 and 586) machines.
> 

With intel processors, the 'rep' before an instruction will not
execute that instruction if ecx is already zero. You do not
have to test. Also, a jump is often much more harmful in instruction
time than straight-through instruction. For instance, the fastest
486 code for an unaligned copy is:

        movl    SRC(%esp), %esi
        movl    DST(%esp), %edi
        movl    CNT(%esp), %ecx
        shrl    $1,%ecx
        rep     movsw
        adcl    %ecx,%ecx
        rep     movsb

If it's longword aligned, i.e.,  both source and destination addresss
are clear in their low two bits, moving longwords through the edx
register, with eax and ebx being the index registers, is faster, even with
a beginning test for longword size.

        movl    SRC(%esp), %eax
        movl    DST(%esp), %ebx
        movl    CNT(%esp), %ecx
        testl   $3, %ecx
        jz      2f
        shrl    $2, %ecx        # long words CY set if an extra word
1:      movl    (%eax), %edx    # Do NOT touch EAX in the next instruction
        movl    %edx, (%ebx)    # Do NOT touch EBX in the next instruction
        leal    4(%eax), %eax   # Adjust EAX index now
        leal    4(%ebx), %ebx   # Adjust EBX index now
        decl    %ecx            # does not change CY
        jnz     1b
 
2:


To be able to run some instructions in parallel, you have to follow the
idea shown in the above comments, i.e., don't touch an index register
in the instructions immediately following its use to address memory.

This will allow the memory access to occur during the parallel execution
of the next instruction(s). 

The decl %ecx should be put BETWEEN the two `leal` instructions so that
the address calculation can occur in parallel with the register operation.
LEA does not affect the flags. In the example above I didn't do this
because it makes the code unclear. 

Various registers used as index registers are not all the same. Register
EAX was not an index register in i386 machines.  It became one in i486
machines. It is faster to use (%eax) than (%ebx).

Cheers,
Dick Johnson

Penguin : Linux version 2.2.15 on an i686 machine (797.90 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Reply via email to