Re: Question on follow_page_mask
On Tue, 23 Feb 2016, Kirill A. Shutemov wrote: > On Tue, Feb 23, 2016 at 06:45:05PM +0530, Anshuman Khandual wrote: > > Not able to understand the first code block of follow_page_mask > > function. follow_huge_addr function is expected to find the page > > struct for the given address if it turns out to be a HugeTLB page > > but then when it finds the page we bug on if it had been called > > with FOLL_GET flag. > > > > page = follow_huge_addr(mm, address, flags & FOLL_WRITE); > > if (!IS_ERR(page)) { > > BUG_ON(flags & FOLL_GET); > > return page; > > } > > > > do_move_page_to_node_array calls follow_page with FOLL_GET which > > in turn calls follow_page_mask with FOLL_GET. On POWER, the > > function follow_huge_addr is defined and does not return -EINVAL > > like the generic one. It returns the page struct if its a HugeTLB > > page. Just curious to know what is the purpose behind the BUG_ON. > > I would guess requesting pin on non-reclaimable page is considered > useless, meaning suspicius behavior. BUG_ON() is overkill, I think. > WARN_ON_ONCE() would make it. No, it's there to guard against abuse, until the correct functionality is implemented: which has not so far been required, I think. The problem is that a get_page() here is too late: it needs to be done inside each arch's implementation of follow_huge_addr(), while holding whatever is the appropriate lock, dropped by the time it returns here. If you look through where FOLL_GET is usually implemented, such as in follow_page_pte(), but pud and pmd cases too, I hope you'll still find that they are careful to get the reference on the page while it's safe in the pagetable. But follow_huge_addr() would need some work to offer the same guarantees: it's good for those "peep at a page without actually getting a reference" cases, but not good enough for preventing a page for being put to some other use completely, before we've secured it with our reference. Unless something's changed: the last time I recall the issue coming up, was when Naoya Horiguchi was working on hugetlbfs page migration: see linux-kernel/linux-mm mail thread "BUG at mm/memory.c:1489!" from 28 May 2014; and the resolution there was not to support the follow_huge_addr() case (which IIRC is peculiar to powerpc alone?). Hugh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Problems with swapping in v4.5-rc on POWER
I've plagiarized the subject from Paulus's "Problems with THP" mail last weekend; but my similar problems are on PowerMac G5 baremetal, with 4kB pages, not capable of THP and no THP configured in. Under heavily swapping load, running kernel builds on tmpfs in limited memory, I've been seeing random segfaults too, internal compiler errors etc. Not easily reproduced: sometimes happens in minutes, sometimes not for several hours. I tried and failed to construct a reproducer for you: my lack of a good recipe has deterred me from reporting it, and seeing Paulus's mail on THP gave me hope that the answer would come up in that thread; but no, that was quickly resolved as a THP issue, since fixed. (Mine had appeared to be fixed in v4.5-rc4 anyway; but I guess I just didn't try hard enough, it resurfaced on -rc5 immediately.) I've seen no sign of such problems on x86. And I saw no sign of such problems on v4.4-rc8-mm1, when I included the fixes to the _PAGE_PTE and _PAGE_SWP_SOFT_DIRTY swapoff issues we discussed back then (in 33 hours of load, should be good enough; but did see such problems a couple of times before including those fixes - I took them to be a side-effect of the page flags issue, but now rather doubt that). The minutes or hours thing: I wonder if that indicates a missing initialization somewhere: that can easily show up soon after booting, but then the machine settles into a steady state of reusing the same structures, now initialized; until much later something disturbs the state and it has to allocate more. Sheer speculation, but I wonder. Hugh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Problems with swapping in v4.5-rc on POWER
On Thu, 25 Feb 2016, Michael Ellerman wrote: > > I do run tests on G5, but obviously not rigorously enough. I kicked off a few > kernel builds on mine and it survived, though once it hits swap it's almost > unusably slow. I'll leave it running overnight and see if I hit anything. Oh yes, I'd forgotten how unusably slow: I tend to forget that I slipped an SSD in there some while back, just for the swapping: slow, but not unusable. Thanks, I'm hoping you will be able to reproduce it yourself. Hugh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Problems with swapping in v4.5-rc on POWER
On Thu, 25 Feb 2016, Aneesh Kumar K.V wrote: > > Can you test the impact of the merge listed below ?(ie, revert the merge and > see if > we can reproduce and also verify with merge applied). This will give us a > set of commits to look closer. We had quiet a lot of page table > related changes going in this merge window. > > f689b742f217b2ffe7 ("Pull powerpc updates from Michael Ellerman:") > > That is the merge commit that added _PAGE_PTE. Another experiment running on it at the moment, I'd like to give that a few more hours, and then will try the revert you suggest. But does that merge revert cleanly, did you try? I'm afraid of interactions, whether obvious or subtle, with the THP refcounting rework. Oh, since I don't have THP configured on, maybe I can ignore any issues from that. Hugh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Problems with swapping in v4.5-rc on POWER
On Wed, 24 Feb 2016, Hugh Dickins wrote: > On Thu, 25 Feb 2016, Aneesh Kumar K.V wrote: > > > > Can you test the impact of the merge listed below ?(ie, revert the merge > > and see if > > we can reproduce and also verify with merge applied). This will give us a > > set of commits to look closer. We had quiet a lot of page table > > related changes going in this merge window. > > > > f689b742f217b2ffe7 ("Pull powerpc updates from Michael Ellerman:") > > > > That is the merge commit that added _PAGE_PTE. > > Another experiment running on it at the moment, I'd like to give that > a few more hours, and then will try the revert you suggest. But does > that merge revert cleanly, did you try? I'm afraid of interactions, > whether obvious or subtle, with the THP refcounting rework. Oh, since > I don't have THP configured on, maybe I can ignore any issues from that. That revert worked painlessly, only a very few and simple conflicts, I ran that under load for 12 hours, no problem seen. I've now checked out an f689b742 tree and started on that, just to confirm that it fails fairly quickly I hope; and will then proceed to git bisect, giving that as bad and 37cea93b as good. Given the uncertainty of whether 12 hours is really long enough to be sure, and perhaps difficulties along the way, I don't rate my chances of a reliable bisection higher than 60%, but we'll see. Hugh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Problems with swapping in v4.5-rc on POWER
On Thu, 25 Feb 2016, Hugh Dickins wrote: > On Wed, 24 Feb 2016, Hugh Dickins wrote: > > On Thu, 25 Feb 2016, Aneesh Kumar K.V wrote: > > > > > > Can you test the impact of the merge listed below ?(ie, revert the merge > > > and see if > > > we can reproduce and also verify with merge applied). This will give us a > > > set of commits to look closer. We had quiet a lot of page table > > > related changes going in this merge window. > > > > > > f689b742f217b2ffe7 ("Pull powerpc updates from Michael Ellerman:") > > > > > > That is the merge commit that added _PAGE_PTE. > > > > Another experiment running on it at the moment, I'd like to give that > > a few more hours, and then will try the revert you suggest. But does > > that merge revert cleanly, did you try? I'm afraid of interactions, > > whether obvious or subtle, with the THP refcounting rework. Oh, since > > I don't have THP configured on, maybe I can ignore any issues from that. > > That revert worked painlessly, only a very few and simple conflicts, > I ran that under load for 12 hours, no problem seen. > > I've now checked out an f689b742 tree and started on that, just to > confirm that it fails fairly quickly I hope; and will then proceed > to git bisect, giving that as bad and 37cea93b as good. > > Given the uncertainty of whether 12 hours is really long enough to be > sure, and perhaps difficulties along the way, I don't rate my chances > of a reliable bisection higher than 60%, but we'll see. I'm sure you won't want a breathless report from me on each bisection step, but I ought to report that: contrary to our expectations, the f689b742 survived without error for 12 hours, so appears to be good. I'll bisect between there and v4.5-rc1. Hugh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Problems with swapping in v4.5-rc on POWER
On Fri, 26 Feb 2016, Hugh Dickins wrote: > On Thu, 25 Feb 2016, Hugh Dickins wrote: > > On Wed, 24 Feb 2016, Hugh Dickins wrote: > > > On Thu, 25 Feb 2016, Aneesh Kumar K.V wrote: > > > > > > > > Can you test the impact of the merge listed below ?(ie, revert the > > > > merge and see if > > > > we can reproduce and also verify with merge applied). This will give us > > > > a > > > > set of commits to look closer. We had quiet a lot of page table > > > > related changes going in this merge window. > > > > > > > > f689b742f217b2ffe7 ("Pull powerpc updates from Michael Ellerman:") > > > > > > > > That is the merge commit that added _PAGE_PTE. > > > > > > Another experiment running on it at the moment, I'd like to give that > > > a few more hours, and then will try the revert you suggest. But does > > > that merge revert cleanly, did you try? I'm afraid of interactions, > > > whether obvious or subtle, with the THP refcounting rework. Oh, since > > > I don't have THP configured on, maybe I can ignore any issues from that. > > > > That revert worked painlessly, only a very few and simple conflicts, > > I ran that under load for 12 hours, no problem seen. > > > > I've now checked out an f689b742 tree and started on that, just to > > confirm that it fails fairly quickly I hope; and will then proceed > > to git bisect, giving that as bad and 37cea93b as good. > > > > Given the uncertainty of whether 12 hours is really long enough to be > > sure, and perhaps difficulties along the way, I don't rate my chances > > of a reliable bisection higher than 60%, but we'll see. > > I'm sure you won't want a breathless report from me on each bisection > step, but I ought to report that: contrary to our expectations, the > f689b742 survived without error for 12 hours, so appears to be good. > I'll bisect between there and v4.5-rc1. The bisection completed this morning (log appended below): not a satisfactory conclusion, it's pointing to a davem/net merge. I was uncomfortable when I marked that point bad in the first place: it ran for 9 hours before hitting a compiler error, which was nearly twice as long as the longest I'd seen before (5 hours), and uncomfortably close to the 12 hours I've been taking as good. My current thinking is that the powerpc merge that you indicated, that I found to be "good", is the one that contains the bad commit; but that the bug is very rare to manifest in that kernel, and my test of the davem/net merge happened to be unusually unlucky to hit it. Then some other later change makes it significantly easier to hit; and identifying that change may make it much easier to pin down what the original bug is. So I've replayed the bisection up to that point, marked the davem/net merge as good this time, and set off again in the hope that it will lead somewhere more enlightening. But prepared for disappointment. Hugh git bisect start # good: [f689b742f217b2ffe7925f8a6521b208ee995309] Merge tag 'powerpc-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux git bisect good f689b742f217b2ffe7925f8a6521b208ee995309 # bad: [92e963f50fc74041b5e9e744c330dca48e04f08d] Linux 4.5-rc1 git bisect bad 92e963f50fc74041b5e9e744c330dca48e04f08d # bad: [7f36f1b2a8c4f55f8226ed6c8bb4ed6de11c4015] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide git bisect bad 7f36f1b2a8c4f55f8226ed6c8bb4ed6de11c4015 # bad: [6606b342febfd470b4a33acb73e360eeaca1d9bb] Merge git://www.linux-watchdog.org/linux-watchdog git bisect bad 6606b342febfd470b4a33acb73e360eeaca1d9bb # good: [d0021d3bdfe9d551859bca1f58da0e6be8e26043] Merge remote-tracking branch 'asoc/topic/wm8960' into asoc-next git bisect good d0021d3bdfe9d551859bca1f58da0e6be8e26043 # good: [e3315b439c30c208582ac64e58f0c0d36b83181e] ALSA: oxfw: allocate own address region for SCS.1 series git bisect good e3315b439c30c208582ac64e58f0c0d36b83181e # good: [3da834e3e5a4a5d26882955298b55a9ed37a00bc] clk: remove duplicated COMMON_CLK_NXP record from clk/Kconfig git bisect good 3da834e3e5a4a5d26882955298b55a9ed37a00bc # bad: [e535d74bc50df2357d3253f8f3ca48c66d0d892a] Merge tag 'docs-4.5' of git://git.lwn.net/linux git bisect bad e535d74bc50df2357d3253f8f3ca48c66d0d892a # bad: [4e5448a31d73d0e944b7adb9049438a09bc332cb] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net git bisect bad 4e5448a31d73d0e944b7adb9049438a09bc332cb # good: [b70ce2ab41cb67ab3d661eda078f7c4029bbca95] dts: hisi: fixes no syscon fault when init mdio git bisect good b70ce2ab41cb67ab3d661eda078f7c4029bbca95 # good: [4a658527271bce43afb1cf4feec89afe6716ca59] xen-netback: delete NAPI instance when queue fails to initialize git bisect good 4a658527271bce43afb1cf4feec89afe6716ca59 # good: [c6894dec8ea9ae05747124dce98b3b5c2e69b168] bridge: fix lockdep addr_list_lock false positive splat git bisect good c6894dec8ea9ae05747124dce98b3b5c2e69b168 # good: [36beca6571c941b28b0798667608239731f9bc3a] sparc64: Fix numa node distance initialization git bisect good 36b
Re: Problems with swapping in v4.5-rc on POWER
On Thu, 3 Mar 2016, Michael Ellerman wrote: > On Wed, 2016-03-02 at 12:49 -0800, Hugh Dickins wrote: > > On Fri, 26 Feb 2016, Hugh Dickins wrote: > > > On Thu, 25 Feb 2016, Hugh Dickins wrote: > > > > On Wed, 24 Feb 2016, Hugh Dickins wrote: > > > > > On Thu, 25 Feb 2016, Aneesh Kumar K.V wrote: > > > > > > > > > > > > Can you test the impact of the merge listed below ?(ie, revert the > > > > > > merge and see if > > > > > > we can reproduce and also verify with merge applied). This will > > > > > > give us a > > > > > > set of commits to look closer. We had quiet a lot of page table > > > > > > related changes going in this merge window. > > > > > > > > > > > > f689b742f217b2ffe7 ("Pull powerpc updates from Michael Ellerman:") > > > > > > > > > > > > That is the merge commit that added _PAGE_PTE. > > > > > > > > > > Another experiment running on it at the moment, I'd like to give that > > > > > a few more hours, and then will try the revert you suggest. But does > > > > > that merge revert cleanly, did you try? I'm afraid of interactions, > > > > > whether obvious or subtle, with the THP refcounting rework. Oh, since > > > > > I don't have THP configured on, maybe I can ignore any issues from > > > > > that. > > > > > > > > That revert worked painlessly, only a very few and simple conflicts, > > > > I ran that under load for 12 hours, no problem seen. > > > > > > > > I've now checked out an f689b742 tree and started on that, just to > > > > confirm that it fails fairly quickly I hope; and will then proceed > > > > to git bisect, giving that as bad and 37cea93b as good. > > > > > > > > Given the uncertainty of whether 12 hours is really long enough to be > > > > sure, and perhaps difficulties along the way, I don't rate my chances > > > > of a reliable bisection higher than 60%, but we'll see. > > > > > > I'm sure you won't want a breathless report from me on each bisection > > > step, but I ought to report that: contrary to our expectations, the > > > f689b742 survived without error for 12 hours, so appears to be good. > > > I'll bisect between there and v4.5-rc1. > > > > The bisection completed this morning (log appended below): > > not a satisfactory conclusion, it's pointing to a davem/net merge. > > > > I was uncomfortable when I marked that point bad in the first place: > > it ran for 9 hours before hitting a compiler error, which was nearly > > twice as long as the longest I'd seen before (5 hours), and > > uncomfortably close to the 12 hours I've been taking as good. > > > > My current thinking is that the powerpc merge that you indicated, > > that I found to be "good", is the one that contains the bad commit; > > but that the bug is very rare to manifest in that kernel, and my test > > of the davem/net merge happened to be unusually unlucky to hit it. > > > > Then some other later change makes it significantly easier to hit; > > and identifying that change may make it much easier to pin down > > what the original bug is. > > > > So I've replayed the bisection up to that point, marked the davem/net > > merge as good this time, and set off again in the hope that it will > > lead somewhere more enlightening. But prepared for disappointment. > > Thanks Hugh. That logic sounds reasonable, I doubt we can blame davem :) > > I've setup another box here to try and reproduce it. It's running with 4k > pages, no THP, and it's going well into swap. Hopefully I can hit the same > bug, > but we'll see in 12 hours I guess. The alternative bisection was as unsatisfactory as the first: again it fingered an irrelevant merge (rather than any commit pulled in by that merge) as the bad commit. It seems this issue is too intermittent for bisection to be useful, on my load anyway. The best I can do now is try v4.4 for a couple of days, to verify that still comes out good (rather than the machine going bad coincident with v4.5-rc), then try v4.5-rc7 to verify that that still comes out bad. I'll report back on those; but beyond that, I'll have to leave it to you. Hugh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Problems with swapping in v4.5-rc on POWER
On Mon, 7 Mar 2016, Michael Ellerman wrote: > On Fri, 2016-03-04 at 09:58 -0800, Hugh Dickins wrote: > > > > The alternative bisection was as unsatisfactory as the first: > > again it fingered an irrelevant merge (rather than any commit > > pulled in by that merge) as the bad commit. > > > > It seems this issue is too intermittent for bisection to be useful, > > on my load anyway. > > Darn. Thanks for trying. > > > The best I can do now is try v4.4 for a couple of days, to verify that > > still comes out good (rather than the machine going bad coincident with > > v4.5-rc), then try v4.5-rc7 to verify that that still comes out bad. > > Thanks, that would still be helpful. v4.4 ran under load for 56 hours without any trouble, before I stopped it to switch kernels. v4.5-rc7 ran for 19.5 hours, then hit the problem (sigsegv in "as" on this occasion). > > > I'll report back on those; but beyond that, I'll have to leave it to you. > > I haven't had any luck here :/ > > Can you give us a more verbose description of your test setup? I'll be a lot more terse than you'd like, not much time to spare. If I had a good reproducer, then of course I should specify it exactly to you; but no, 19.5 hours or 5 hours or a few minutes, that does not amount to a good reproducer. > > - G5, which exact model? /proc/cpuinfo says: processor : 0 cpu : PPC970MP, altivec supported clock : 2500.00MHz revision: 1.1 (pvr 0044 0101) processor : 1 cpu : PPC970MP, altivec supported clock : 2500.00MHz revision: 1.1 (pvr 0044 0101) processor : 2 cpu : PPC970MP, altivec supported clock : 2500.00MHz revision: 1.1 (pvr 0044 0101) processor : 3 cpu : PPC970MP, altivec supported clock : 2500.00MHz revision: 1.1 (pvr 0044 0101) timebase: platform: PowerMac model : PowerMac11,2 machine : PowerMac11,2 motherboard : PowerMac11,2 MacRISC4 Power Macintosh detected as : 337 (PowerMac G5 Dual Core) pmac flags : L2 cache: 1024K unified pmac-generation : NewWorld > - 4k pages, no THP. Yes. > - how much ram & swap? I boot with mem=700M, and use 1.5G swap. > - building linus' tree, make -j ? Building an old 2.6.24 tree (which had a higher source to built ratio than nowadays; with patches to get it to build with more recent toolchain, from openSUSE 13.1); building some config I used to run on that machine. Building two of them, each make -j20, concurrently: one in tmpfs, one in 4kB-blocksize ext4 on loop on tmpfs file. But I doubt that complication is relevant here: sometimes it's the build in tmpfs that collapses, sometimes the build in ext4, it's fairly even which. (Do not bother to attempt such a load on linux-next, only on v4.5: the OOM rework in mmotm has an unsolved problem with order=2 allocations, which means that such a load will be OOM-killed very quickly.) > - source and output on tmpfs? (how big?) One source and output in ext4 on loop on file filling 470M tmpfs. Other source and output in tmpfs on /tmp which I happen to size at 1300M (but could be half that). Sizes of course fitted to that source tree and config I happen to be building. > - what device is the swap device? (you said SSD I think?) Old 75G Intel SSD: ata2.00: ATA-7: INTEL SSDSA2M080G2GN, 2CV102HD, max UDMA/133 > - anything else I've forgotten? I happen to run with /proc/sys/vm/swappiness 100, merely because it's swapping that I'm trying to exercise. I doubt that any of the details above are important: plenty of swapping is probably the only message (and doing everything in tmpfs in limited memory is a good way to force plenty of swapping). > > Oh and can you send us your bisect logs, we can at least trust the bad results > I think. Remember that both of these bisections started from 4.5-rc1 as bad, and f689b742f217, the powerpc merge, as good - since I didn't see a problem at that commit in 12 hours. But we all suspect that in fact something in that powerpc merge was actually the bad. git bisect start # good: [f689b742f217b2ffe7925f8a6521b208ee995309] Merge tag 'powerpc-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux git bisect good f689b742f217b2ffe7925f8a6521b208ee995309 # bad: [92e963f50fc74041b5e9e744c330dca48e04f08d] Linux 4.5-rc1 git bisect bad 92e963f50fc74041b5e9e744c330dca48e04f08d # bad: [7f36f1b2a8c4f55f8226ed6c8bb4ed6de11c4015] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide git bisect bad 7f36f1b2a8c4f55f8226ed6c8bb4ed6de11c4015 # bad: [6606b342febfd470b4a33acb73e360eeaca1d9bb] Merge git://www.linux-watchdog.org/linux-watchdog git bisect bad 6606b342febfd470b4a33acb73e360eeaca1d9bb # good: [d0021d3bdfe9d551859bca1f58da0e6be8e26043] Merge remote-tracking branch 'asoc/topic/wm8960' into asoc-next git bisect good d0021d3bdfe9d551859bca1f58da0e6be8e26043 # goo
Re: [PATCH] powerpc/mm/hash: Fix the reference bit update when handling hash fault
On Fri, 27 May 2016, Aneesh Kumar K.V wrote: > When we converted the asm routines to C functions, we missed updating > HPTE_R_R based on _PAGE_ACCESSED. ASM code used to copy over the lower > bits from pte via. > > andi. r3,r30,0x1fe/* Get basic set of flags */ > > Fixes: 'commit 89ff725051d1 ("powerpc/mm: Convert __hash_page_64K to C")' > > We also don't set C bit always with this patch. This was added by > 'commit c5cf0e30bf3d8 ("[PATCH] powerpc: Fix buglet with MMU hash > management")' > > With hash64, we need to make sure that hardware doesn't do a pte update > directly. This is because we do end up with entries in TLB with no hash > page table entry. This happens because when we find hash bucket full, > we "evict" a more/less random entry from it. When we do that we don't > invalidate the TLB (hpte_remove) because we assume the old translation > is still technically "valid". For more info look at > 'commit 0608d692463(" powerpc/mm: Always invalidate tlb on hpte invalidate and > update")'. Now that implies that hardware should never do a writeback to > update 'R' or 'C' hpte bits. > > Commitc 5cf0e30bf3d8 did that for 'C' bit by enabling 'C' bit always. > We don't really need to do that because we never map a RW pte entry > without setting 'C' bit. on READ fault on a RW pte entry, we still map > it READ only, hence a store update in the page will still cause a hash > pte fault. > > This patch reverts the 'C' update part of the c5cf0e30bf3d8 > ("[PATCH] powerpc: Fix buglet with MMU hash management") and retain > the updatepp part. > - If we hit the updatepp path on native, the old code without that > commit, would fail to set C bcause native_hpte_updatepp() > was implemented to filter the same bits as H_PROTECT and not let C > through thus we would "upgrade" a RO HPTE to RW without setting C > thus causing the bug. So the real fix in that commit was the change > to native_hpte_updatepp > > Signed-off-by: Benjamin Herrenschmidt > Signed-off-by: Aneesh Kumar K.V > --- > arch/powerpc/mm/hash_utils_64.c | 22 -- > 1 file changed, 20 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c > index 9c594ea99149..76fa5f3eb450 100644 > --- a/arch/powerpc/mm/hash_utils_64.c > +++ b/arch/powerpc/mm/hash_utils_64.c > @@ -159,6 +159,19 @@ static struct mmu_psize_def mmu_psize_defaults_gp[] = { > }, > }; > > +/* > + * 'R' and 'C' update notes: > + * - Under pHyp or KVM, the updatepp path will not set C, thus it *will* > + * create writeable HPTEs without C set, because the hcall H_PROTECT > + * that we use in that case will not update C > + * - The above is however not a problem, because we also don't do that > + * fancy "no flush" variant of eviction and we use H_REMOVE which will > + * do the right thing and thus we don't have the race I described earlier > + * > + *- Under bare metal, we do have the race, so we need R and C set > + *- We make sure R is always set and never lost > + *- C is _PAGE_DIRTY, and *should* always be set for a writeable mapping > + */ > unsigned long htab_convert_pte_flags(unsigned long pteflags) > { > unsigned long rflags = 0; > @@ -186,9 +199,14 @@ unsigned long htab_convert_pte_flags(unsigned long > pteflags) > rflags |= 0x1; > } > /* > - * Always add "C" bit for perf. Memory coherence is always enabled > + * We can't allow hardware to update hpte bits. Hence always > + * set 'R' bit and set 'C' if it is a write fault > + * Memory coherence is always enabled >*/ > - rflags |= HPTE_R_C | HPTE_R_M; > + rflags |= HPTE_R_R | HPTE_R_M; > + > + if (pteflags & _PAGE_DIRTY) > + rflags |= HPTE_R_C; > /* >* Add in WIG bits >*/ > -- > 2.7.4 Thanks a lot for the patch: I'm just starting to try it out. I had hoped to try it on 4.7-rc1, but found yesterday that rc1 has at least two issues (not specific to powerpc) with OOMing and leaking memory, that I've no chance of sustaining my swapping load on it. I'll hope at least one of them gets fixed by rc2 and try again then. So I've applied your patch to v4.5 and v4.6, and set v4.6 going for now. But if all goes well, I won't be able to report back with confidence for a couple of days. I don't mean to be churlish, and subtract from your triumph in tracking this down (assuming you have), but that commit log... okay, it's intended for powerpc mmu experts, not me, but if it hasn't already gone into git, then a rewrite could be very helpful. I gather that C is a bit as well as a language :) and there is a code comment which helped me by mentioning "C is _PAGE_DIRTY, and *should* always be set for a writeable mapping" (though I wonder what imposes that "should" - certainly not core mm); which was the first mention of "dirty" or "modified" in the whole thing. And it needs a Cc to stable.
Re: [PATCH] powerpc/mm/hash: Fix the reference bit update when handling hash fault
On Tue, 31 May 2016, Benjamin Herrenschmidt wrote: > On Mon, 2016-05-30 at 09:39 -0700, Hugh Dickins wrote: > > I don't mean to be churlish, and subtract from your triumph in tracking > > this down (assuming you have), but that commit log... okay, it's intended > > for powerpc mmu experts, not me, but if it hasn't already gone into git, > > then a rewrite could be very helpful. > > Something along these lines: > > The powerpc hash table has a R (Referenced) and C (Changed) bits that > somewhat correspond to Linux _PAGE_DIRTY and _PAGE_ACCESSED. However we > don't currently use them. > > Moreover, we also require them to never be updated by HW. This is due > to an optimization we have in the hash eviction code, which would be > racy vs. a hardware update as the HW updates are done non-atomically. > > Thus it's critical that valid hash PTEs always have R set and writeable > ones have C set. We do this by hashing a non-dirty linux PTE as read-only and > always setting _PAGE_ACCESSED (and thus R) when hashing anything else in. Any > attempt by Linux at clearing those bits also removes the corresponding hash > entry. > > The old commit <.> fixed an issue where we would miss setting C in > the specific case where a Linux PTE was upgraded from read only to > read-write (and appropriately made dirty). The hash code would realize > the hash PTE is already present and would use a different path than the > normal insertion path for updating a hash entry in-place. That path > unfortunately didn't update "C". > > That commit however got a bit over zealous and also forced C on any > entry including those that aren't writeable. That was unnecessary. > > In commit 89ff725051d1, when converting to C, we mangled that up: > > - We kept the useless part of <> setting C always instead of > only when _PAGE_DIRTY is set > > - We never set R thus letting the HW do the racy updates. > > This fixes it. Thanks, that helped me to understand a lot better - and the original commitlog then made much more sense to me after this version. Plus I can now see that it is inherently a very confusing situation, made the more confusing by the series of not-quite-right commits, so all the more difficult to explain to an outsider. I still won't pretend to understand it fully, but don't mind that: my main concern is that if the commitlog is confused, then that might hint that the code is still not quite right. But all my evidence so far is that it is now right: I'll continue testing v4.6+fix on a couple of loads until this evening: all is well so far. And then switch to testing v4.5+fix on those loads for another day and a half. Hugh___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/mm/hash: Fix the reference bit update when handling hash fault
On Tue, 31 May 2016, Hugh Dickins wrote: > > But all my evidence so far is that it is now right: I'll continue > testing v4.6+fix on a couple of loads until this evening: all is > well so far. And then switch to testing v4.5+fix on those loads > for another day and a half. I'm glad to confirm: your patch to htab_convert_pte_flags() fixes all the elusive sigsegv issues I was seeing under load on v4.5 and v4.6. Thank you! Hugh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev