Hi Ben, > That is surprising, do we have any idea what specifically increases > the overhead so significantly ? Does gcc know about ldbrx/stdbrx ? I > notice in our io.h for example we still do manual ld/std + swap > because old processors didn't know these, we should fix that for > CONFIG_POWER8 (or is it POWER7 that brought these ?).
The futex issue seems to be __get_user_pages_fast(): ld r11,0(r6) ... rldicl r8,r11,32,32 rotlwi r28,r11,24 rlwimi r28,r11,8,8,15 rotlwi r6,r8,24 rlwimi r28,r11,8,24,31 rlwimi r6,r8,8,8,15 rlwimi r6,r8,8,24,31 rldicr r28,r28,32,31 or r28,r28,r6 cmpdi cr7,r28,0 beq cr7,2428 That's a whole lot of work just to check if a pte is zero. I assume the reason gcc can't replace this with a byte reversed load is that we access the pte via the READ_ONCE() macro. I see the same issue in unmap_page_range(), __hash_page_64K(), handle_mm_fault(). The other issue I see is when we access a pte via larx/stcx, and then we have no choice but to byte swap it manually. I see that in __hash_page_64K(): rldicl r28,r30,32,32 rotlwi r0,r30,24 rlwimi r0,r30,8,8,15 rotlwi r10,r28,24 rlwimi r0,r30,8,24,31 rlwimi r10,r28,8,8,15 rlwimi r10,r28,8,24,31 rldicr r0,r0,32,31 or r0,r0,r10 hwsync ldarx r12,0,r6 cmpd r12,r11 bne- c00000000004fad0 stdcx. r0,0,r6 bne- c00000000004fab8 hwsync Anton _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev