On Tue, Jun 7, 2011 at 7:47 PM, Benjamin Herrenschmidt <b...@kernel.crashing.org> wrote: > On Tue, 2011-06-07 at 16:36 -0500, Eric Van Hensbergen wrote: > >> open to alternatives. jimix also suggested changing NEED_L1_WRITETHROUGH >> to DCBZ_BROKEN, which I'm open to if you think appropriate, or maybe >> DCBZ_BROKEN_DAMNIT would be more apt. > > :-) > > I think NEED_L1_WRITETHROUGH isn't great since we are dealing with more > than just that here. Let's call it 44x_SMP since afaik, all > implementations, whether it's BG or other variants of the same hack > (AMCC/APM has one too) need the same stuff here, no ? > > Let's not use more than one feature bit, it's not needed in practice, a > better name is all we need. Might even call it MMU_FTR_BLUEGENE_44x_SMP > if you want. >
I've got it as MMU_FTR_44x_SMP now, just wanted to bounce off of Josh to make sure he's okay with it since he owns the 44x stuff. If he'd rather, I'll change it to MMU_FTR_BGP_44x_SMP. > > I'll add comments inline: > >> #define PPC44x_MMUCR_TID 0x000000ff >> #define PPC44x_MMUCR_STS 0x00010000 >> +#define PPC44x_MMUCR_U2 0x00200000 > > Please document in a comment what is the effect of U2 on the BG/P ASIC > caches. > Is a comment sufficient, or would you rather also have something along the lines of +#define PPC44x_MMUCR_U2 0x00200000 +#define PPC44x_MMUCR_U2_SWOAE PPC44x_MMUCR_U2 /* store without allocation */ or even... +#define PPC44x_MMUCR_U2_BGP_SWOAE PPC44x_MMUCR_U2 /* store without allocation on BGP */ Seems like its getting a bit too verbose, maybe that's not a bad thing. As long as I don't have to type it too many times :) > > BTW. Care to explain to me why you have U2 -both- in the arguments to > tlbwe and in MMUCR ? That doesn't look right to me... which one is used > where and when ? > My reading of the databook is that U2SWOAE is an enable bit that lets the U2 storage attribute control the behavior. >> @@ -814,7 +829,15 @@ skpinv: addi r4,r4,1 /* >> Increment */ >> /* attrib fields */ >> /* Added guarded bit to protect against speculative loads/stores */ >> li r5,0 >> - ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | >> PPC44x_TLB_G) >> +BEGIN_MMU_FTR_SECTION >> + ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | \ >> + PPC44x_TLB_G | PPC44x_TLB_U2) >> + oris r5,r5,PPC44x_TLB_WL1@h >> +END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_L1_WRITETHROUGH) >> +BEGIN_MMU_FTR_SECTION >> + ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | \ >> + PPC44x_TLB_G) >> +END_MMU_FTR_SECTION_IFCLR(MMU_FTR_NEED_L1_WRITETHROUGH) >> >> li r0,63 /* TLB slot 63 */ > > This isn't going to work. This happens before the CPU feature bits are > established. > > I see two ways out of that dilemna: > > - One is you find a way to identify the BG case at runtime from that > very early asm code. It's a bit tricky since we never added the MMU type > information to the device-tree blob header (but we're adding it to ePAPR > via a register iirc, so we could hijack that), or maybe via inspecting > what the FW left behind in the TLB... > Well, if we are using the u-boot scenario, I can control how the bootloader sets up the device tree and add markers that we can use to let us do this. >> diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S >> index 998a100..b54e2e8 100644 >> --- a/arch/powerpc/kernel/misc_32.S >> +++ b/arch/powerpc/kernel/misc_32.S >> @@ -506,7 +506,27 @@ _GLOBAL(clear_pages) >> li r0,PAGE_SIZE/L1_CACHE_BYTES >> slw r0,r0,r4 >> mtctr r0 >> -1: dcbz 0,r3 >> + li r4, 0 >> +1: >> +BEGIN_MMU_FTR_SECTION >> + /* assuming 32 byte cacheline */ >> + stw r4, 0(r3) >> + stw r4, 4(r3) >> + stw r4, 8(r3) >> + stw r4, 12(r3) >> + stw r4, 16(r3) >> + stw r4, 20(r3) >> + stw r4, 24(r3) >> + stw r4, 28(r3) >> +END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_L1_WRITETHROUGH) >> +/* >> + * would have used an ELSE_MMU_FTR_SECTION here but it >> + * broke the code with Error: non-constant expression in ".if" statement >> + * >> + */ >> +BEGIN_MMU_FTR_SECTION >> + dcbz 0,r3 >> +END_MMU_FTR_SECTION_IFCLR(MMU_FTR_NEED_L1_WRITETHROUGH) >> addi r3,r3,L1_CACHE_BYTES >> bdnz 1b >> blr >> @@ -550,7 +570,9 @@ _GLOBAL(copy_page) >> mtctr r0 >> 1: >> dcbt r11,r4 >> +BEGIN_MMU_FTR_SECTION >> dcbz r5,r3 >> +END_MMU_FTR_SECTION_IFCLR(MMU_FTR_NEED_L1_WRITETHROUGH) >> COPY_16_BYTES >> #if L1_CACHE_BYTES >= 32 >> COPY_16_BYTES > > Instead here I would just do a single feature section as the first > instruction of clear_pages() that covers a branch out of line to an > alternate implementation of the whole function. > > _GLOBAL(clear_pages) > BEGIN_MMU_FTR_SECTION > b clear_pages_no_dcbz > END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_L1_WRITETHROUGH) > .../... > okay, jimi had suggested something similar: clear_pages_fake: li r4, 0 1: fake_dcbz r4, r3, L1_CACHE_BYTES addi r3,r3,L1_CACHE_BYTES bdnz 1b blr _GLOBAL(clear_pages) li r0,PAGE_SIZE/L1_CACHE_BYTES slw r0,r0,r4 mtctr r0 BEGIN_MMU_FTR_SECTION b clear_pages_fake END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_L1_WRITETHROUGH) >> diff --git a/arch/powerpc/lib/copy_32.S b/arch/powerpc/lib/copy_32.S >> index 55f19f9..2646838 100644 >> --- a/arch/powerpc/lib/copy_32.S >> +++ b/arch/powerpc/lib/copy_32.S >> @@ -12,6 +12,7 @@ >> #include <asm/cache.h> >> #include <asm/errno.h> >> #include <asm/ppc_asm.h> >> +#include <asm/mmu.h> > > This is a bit more nasty. At some point we'll have to butcher that code > in order to deal with 440 and 476 that have different cache line sizes > and which we still want in the same kernel binary. In the meantime > I'd do like previously and just duplicate the whole lot with just a > single branch out. > > Note that I fail to see how your cachable_memzero can be correct since > you don't replace dcbz with anything. On the other hand, the only user > of it in the entire tree is ... clearing the hash table in ppc_mmu_32.c > which we don't use on 440. > Yeah, as I was going over the take2 version of the patch, I was uncomfortable with this as well. I guess the BG guys just never tripped over it because it was never called. > So why don't you just make a separate patch that just completely gets > rid of cachable_memzero() and use a memset in ppc_mmu_32.c ? I don't > think anybody will notice the difference.... > > For cachable_memcpy and copy_tofrom_user, just removing the dcbz's will > do for now, though I wonder whether we could just remove cachable_memcpy > as well. The only user is the EMAC driver and I doubt anybody will be > able to measure the difference. It will make everbody life easier in the > long term to remove those. > Okay, will do. >> : >> #ifdef CONFIG_PPC47x >> : "r" (PPC47x_TLB2_S_RWX), >> -#else >> +#elseif CONFIG_BGP_L1_WRITETHROUGH >> + : "r" (PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_WL1 \ >> + | PPC44x_TLB_U2 | PPC44x_TLB_M), >> +#else /* neither CONFIG_PPC47x or CONFIG_BGP_L1_WRITETHROUGH */ >> : "r" (PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_G), >> -#endif >> +#endif /* CONFIG_PPC47x */ >> "r" (phys), >> "r" (virt | PPC44x_TLB_VALID | PPC44x_TLB_256M), >> "r" (entry), > > Make this conditional at runtime. > Oops. sorry, that one slipped through. I'll try and turn around [V4] later today. Thanks for the feedback. -eric _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev