On 2/4/16, Christophe Leroy <christophe.le...@c-s.fr> wrote: > > > Le 04/02/2016 12:37, Denis Kirjanov a écrit : >> On 2/4/16, Christophe Leroy <christophe.le...@c-s.fr> wrote: >>> This simplification helps the compiler. We now have only one test >>> instead of two, so it reduces the number of branches. >>> >>> Signed-off-by: Christophe Leroy <christophe.le...@c-s.fr> >>> --- >>> v2: new >>> v3: no change >>> v4: no change >>> v5: no change >>> >>> arch/powerpc/mm/dma-noncoherent.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/arch/powerpc/mm/dma-noncoherent.c >>> b/arch/powerpc/mm/dma-noncoherent.c >>> index 169aba4..2dc74e5 100644 >>> --- a/arch/powerpc/mm/dma-noncoherent.c >>> +++ b/arch/powerpc/mm/dma-noncoherent.c >>> @@ -327,7 +327,7 @@ void __dma_sync(void *vaddr, size_t size, int >>> direction) >>> * invalidate only when cache-line aligned otherwise there is >>> * the potential for discarding uncommitted data from the cache >>> */ >>> - if ((start & (L1_CACHE_BYTES - 1)) || (size & (L1_CACHE_BYTES - >>> 1))) >>> + if ((start | end) & (L1_CACHE_BYTES - 1)) >>> flush_dcache_range(start, end); >>> else >>> invalidate_dcache_range(start, end); >> The previous version of address cache-line aligned check reads perfectly >> fine. >> What's the benefit of this micro optimization? > With this optimisation we avoid one unneccessary test and two associated > jumps. Taking into account that __dma_sync() is one of the top ten CPU > consummers, I believe it is worth it: > > Without the patch: > > c000d894: 70 6a 00 0f andi. r10,r3,15 > c000d898: 39 29 00 0f addi r9,r9,15 > c000d89c: 54 63 00 36 rlwinm r3,r3,0,0,27 > c000d8a0: 7d 23 48 50 subf r9,r3,r9 > c000d8a4: 41 82 00 84 beq c000d928 <__dma_sync+0xb8> > [...] > c000d8c0: 7c 00 04 ac sync > c000d8c4: 4e 80 00 20 blr > [...] > c000d928: 70 8a 00 0f andi. r10,r4,15 > c000d92c: 40 a2 ff 7c bne c000d8a8 <__dma_sync+0x38> > c000d930: 55 2a e1 3f rlwinm. r10,r9,28,4,31 > c000d934: 41 a2 ff 8c beq c000d8c0 <__dma_sync+0x50> > > With the patch: > > c000d894: 7c 89 1b 78 or r9,r4,r3 > c000d898: 71 2a 00 0f andi. r10,r9,15 > c000d89c: 54 63 00 36 rlwinm r3,r3,0,0,27 > c000d8a0: 38 84 00 0f addi r4,r4,15 > c000d8a4: 7c 83 20 50 subf r4,r3,r4 > c000d8a8: 41 82 00 84 beq c000d92c <__dma_sync+0xbc> > [...] > c000d8c4: 7c 00 04 ac sync > c000d8c8: 4e 80 00 20 blr > [...] > c000d92c: 54 89 e1 3f rlwinm. r9,r4,28,4,31 > c000d930: 41 a2 ff 94 beq c000d8c4 <__dma_sync+0x54>
Yeah, looks better. Did you compile the kernel with default compiler flags? Thanks! > > > Christophe >>> -- >>> 2.1.0 >>> >>> _______________________________________________ >>> Linuxppc-dev mailing list >>> Linuxppc-dev@lists.ozlabs.org >>> https://lists.ozlabs.org/listinfo/linuxppc-dev > > _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev