On Tue, Jul 28, 2015 at 12:19:54PM +0200, Thomas Huth wrote: > : invert-region ( addr len -- ) > - 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP drop > -; > - > -: invert-region-x ( addr len -- ) > - /x / 0 ?DO dup dup rx@ -1 xor swap rx! xa1+ LOOP drop > + 2dup or 7 and CASE > + 0 OF 3 rshift 0 ?DO dup dup rx@ -1 xor swap rx! xa1+ LOOP ENDOF > + 2 OF 1 rshift 0 ?DO dup dup rw@ -1 xor swap rw! wa1+ LOOP ENDOF > + 4 OF 2 rshift 0 ?DO dup dup rl@ -1 xor swap rl! la1+ LOOP ENDOF > + 6 OF 1 rshift 0 ?DO dup dup rw@ -1 xor swap rw! wa1+ LOOP ENDOF > + dup OF 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP ENDOF > + ENDCASE > + drop > ;
Can you access device memory as 64 bits for all supported devices? You can get a bigger speedup by writing some of the core blitting functions in C, btw. A small simplification: 2dup or 7 and CASE 0 OF 3 rshift 0 ?DO dup dup rx@ -1 xor swap rx! xa1+ LOOP ENDOF 4 OF 2 rshift 0 ?DO dup dup rl@ -1 xor swap rl! la1+ LOOP ENDOF 3 and 2 OF 1 rshift 0 ?DO dup dup rw@ -1 xor swap rw! wa1+ LOOP ENDOF dup OF 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP ENDOF ENDCASE If this code is often called unaligned, it makes more sense to special- case the begin and end probably. Segher _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev