Hi Segher, On 28/07/15 19:04, Segher Boessenkool wrote: > On Tue, Jul 28, 2015 at 12:19:54PM +0200, Thomas Huth wrote: >> : invert-region ( addr len -- ) >> - 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP drop >> -; >> - >> -: invert-region-x ( addr len -- ) >> - /x / 0 ?DO dup dup rx@ -1 xor swap rx! xa1+ LOOP drop >> + 2dup or 7 and CASE >> + 0 OF 3 rshift 0 ?DO dup dup rx@ -1 xor swap rx! xa1+ LOOP ENDOF >> + 2 OF 1 rshift 0 ?DO dup dup rw@ -1 xor swap rw! wa1+ LOOP ENDOF >> + 4 OF 2 rshift 0 ?DO dup dup rl@ -1 xor swap rl! la1+ LOOP ENDOF >> + 6 OF 1 rshift 0 ?DO dup dup rw@ -1 xor swap rw! wa1+ LOOP ENDOF >> + dup OF 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP ENDOF >> + ENDCASE >> + drop >> ; > > Can you access device memory as 64 bits for all supported devices?
Yes, should be fine since 64 bit access was already used in the original code, see fb8-invert-screen in https://github.com/aik/SLOF/commit/99c534ecc7a8566bd9ca6346915d9ac1bfacae1e > You can get a bigger speedup by writing some of the core blitting > functions in C, btw. Well, the above code is for js2x only ... so this is likely not worth the effort anymore. The code for qemu-spapr calls into a hypercall already, so this is already accelerated. > A small simplification: > > 2dup or 7 and CASE > 0 OF 3 rshift 0 ?DO dup dup rx@ -1 xor swap rx! xa1+ LOOP ENDOF > 4 OF 2 rshift 0 ?DO dup dup rl@ -1 xor swap rl! la1+ LOOP ENDOF > 3 and > 2 OF 1 rshift 0 ?DO dup dup rw@ -1 xor swap rw! wa1+ LOOP ENDOF > dup OF 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP ENDOF > ENDCASE Ok, nice idea, makes sense! I'll include it in v2 (after waiting a little bit to see if there's other feedback) > If this code is often called unaligned, it makes more sense to special- > case the begin and end probably. It's only used for drawing the cursor, so it always should be aligned. Thomas _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev