Re: [RFC 1/3] powerpc: __copy_tofrom_user tweaked for Cell

Gunnar von Boehn Thu, 19 Jun 2008 08:18:48 -0700

Hi Arnd,

> You don't have a page wise user copy,
> which the regular code has.

The new code does not need two version IMHO.
The "regular" code was much slower for the normal case and has a special
version for the 4K optimized case.
The new code is equally good in both cases, so adding an extra 4K routine
is will increase the code size for very minor gain. I'm not sure if its
worth it.

Benchmark result on QS22 for good aligned copy
Old-code : 1300 MB/sec
Old-code 4k Special case: 2600 MB/sec
New code : 4000 MB/sec (always)

> You don't align the source to word size, only the target.
> Does this get handled correctly when the source
> is a noncacheable mapping, e.g.

The problem is that on CELL the required shift instructions
for SRC alignment are microcoded, in other words really slow.
You are right the main copy2user requires that the SRC is cacheable.
IMHO because of the exception on load, the routine should fallback to the
byte copy loop.

Arnd, could you verify that it works on localstore?

Cheers
Gunnar

             Arnd Bergmann                                                 
             <[EMAIL PROTECTED]>                                               
                                                                        To 
             19/06/2008 16:43          linuxppc-dev@ozlabs.org             
                                                                        cc 
                                       Mark Nelson <[EMAIL PROTECTED]>,    
                                       [EMAIL PROTECTED], Gunnar von  
                                       Boehn/Germany/Contr/[EMAIL PROTECTED],   

                                       Michael Ellerman                    
                                       <[EMAIL PROTECTED]>              
                                                                   Subject 
                                       Re: [RFC 1/3] powerpc:              
                                       __copy_tofrom_user tweaked for Cell 

On Thursday 19 June 2008, Mark Nelson wrote:

>  * __copy_tofrom_user routine optimized for CELL-BE-PPC

A few things I noticed:

* You don't have a page wise user copy, which the regular code
has. This is probably not so noticable in iperf, but should
have a significant impact on lmbench and on a number of file
system tests that copy large amounts of data. Have you checked
that the loop around cache lines is just as fast?

* You don't align the source to word size, only the target.
Does this get handled correctly when the source is a noncacheable
mapping, e.g. an unaligned copy_from_user where the source points
to a physical local store mapping of an SPU? I don't think we
need to optimize this case for performance, but I'm not sure
if it would crash. AFAIR, unaligned loads from noncacheable storage
give you an alignment exception that you need to handle, right?

* The naming of the labels (with just numbers) is rather confusing,
it would be good to have something better, but I must admit that
I don't have a good idea either.

* The trick of using the condition code in cr7 for the last bytes
is really cute, but are the four branches actually better than a
single computed branch into the middle of 15 byte wise copies?

             Arnd <><

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [RFC 1/3] powerpc: __copy_tofrom_user tweaked for Cell

Reply via email to