On Fri, 2009-04-24 at 10:51 +0100, Mel Gorman wrote: > On Tue, Apr 21, 2009 at 08:27:57PM -0700, Linus Torvalds wrote: > > Another week, another -rc. > > > > I'm seeing some tests with sysbench+postgres+large pages fail on ppc64 > although a very clear pattern is not forming as to what exactly is > causing it. However, the libhugetlbfs regression tests (make && make > func) are triggering the following oops when calling mlock() and so are > likely related. > > ------------[ cut here ]------------ > kernel BUG at arch/powerpc/mm/pgtable.c:243! > Oops: Exception in kernel mode, sig: 5 [#1] > SMP NR_CPUS=128 NUMA pSeries > Modules linked in: dm_snapshot dm_mirror dm_region_hash dm_log qla2xxx > loop nfnetlink iptable_filter iptable_nat nf_nat ip_tables > nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT > xt_tcpudp xt_limit ipt_LOG xt_pkttype x_tables > NIP: c00000000002becc LR: c00000000002c02c CTR: 0000000000000000 > REGS: c0000000ea92b4c0 TRAP: 0700 Not tainted (2.6.30-rc3-autokern1) > MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 28000484 XER: 20000020 > TASK = c00000000395b660[7611] 'mlock' THREAD: c0000000ea928000 CPU: 3 > GPR00: 0000000000000001 c0000000ea92b740 c0000000008ea170 c0000000ec7d4980 > GPR04: 000000003f000000 c0000001e2278cf8 0000001900000393 0000000000000001 > GPR08: f000000002bc0000 0000000000000000 0000000000000113 c0000001e2278c81 > GPR12: 0000000044000482 c00000000093b880 0000000028004422 0000000000000000 > GPR16: c0000000ea92bbf0 c0000000009f06f0 0000001900000113 c0000000ec7d4980 > GPR20: 0000000000000000 f000000002bc0000 000000003f000000 c0000001e2278cf8 > GPR24: c0000000eaa90bb0 0000000000000000 c0000000eaa90bb0 c0000000ea928000 > GPR28: f000000002bc0000 0000001900000393 0000000000000001 c0000001e2278cf8 > NIP [c00000000002becc] .assert_pte_locked+0x54/0x8c > LR [c00000000002c02c] .ptep_set_access_flags+0x50/0x8c > Call Trace: > [c0000000ea92b740] [c0000000eaa90bb0] 0xc0000000eaa90bb0 (unreliable) > [c0000000ea92b7d0] [c0000000000ed1b0] .hugetlb_cow+0xd4/0x654 > [c0000000ea92b900] [c0000000000edbf0] .hugetlb_fault+0x4c0/0x708 > [c0000000ea92b9f0] [c0000000000ee890] .follow_hugetlb_page+0x174/0x364 > [c0000000ea92bae0] [c0000000000d8d30] .__get_user_pages+0x288/0x4c0 > [c0000000ea92bbb0] [c0000000000da10c] .make_pages_present+0xa0/0xe0 > [c0000000ea92bc40] [c0000000000db758] .mlock_fixup+0x90/0x228 > [c0000000ea92bd00] [c0000000000dbb38] .do_mlock+0xc4/0x128 > [c0000000ea92bda0] [c0000000000dbccc] .SyS_mlock+0xb0/0xec > [c0000000ea92be30] [c00000000000852c] syscall_exit+0x0/0x40 > Instruction dump: > 0b000000 78892662 79291f24 7d69582a 7d600074 7800d182 0b000000 78895e62 > 79291f24 7d29582a 7d200074 7800d182 <0b000000> 3c004000 3960ffff > 780007c6 > ---[ end trace 36a7faa04fa9452b ]--- > > This corresponds to > > #ifdef CONFIG_DEBUG_VM > void assert_pte_locked(struct mm_struct *mm, unsigned long addr) > { > pgd_t *pgd; > pud_t *pud; > pmd_t *pmd; > > if (mm == &init_mm) > return; > pgd = mm->pgd + pgd_index(addr); > BUG_ON(pgd_none(*pgd)); > pud = pud_offset(pgd, addr); > BUG_ON(pud_none(*pud)); > pmd = pmd_offset(pud, addr); > BUG_ON(!pmd_present(*pmd)); <----- THIS LINE > BUG_ON(!spin_is_locked(pte_lockptr(mm, pmd))); > } > #endif /* CONFIG_DEBUG_VM */ > > This area was last changed by commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36 > in the 2.6.30-rc1 timeframe. I think there was another hugepage-related > problem with this patch but I can't remember what it was.
It broke modules, but I don't remember anything hugepage related. So the code changed from: -#define ptep_set_access_flags(__vma, __address, __ptep, __entry, __dirty) \ -({ \ - int __changed = !pte_same(*(__ptep), __entry); \ - if (__changed) { \ - __ptep_set_access_flags(__ptep, __entry, __dirty); \ - flush_tlb_page_nohash(__vma, __address); \ - } \ - __changed; \ -}) to: +int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, + pte_t *ptep, pte_t entry, int dirty) +{ + int changed; + if (!dirty && pte_need_exec_flush(entry, 0)) + entry = do_dcache_icache_coherency(entry); + changed = !pte_same(*(ptep), entry); + if (changed) { + assert_pte_locked(vma->vm_mm, address); + __ptep_set_access_flags(ptep, entry); + flush_tlb_page_nohash(vma, address); + } + return changed; +} So the call to assert_pte_locked() is new. And it's never going to work for huge pages, the page table structure is different right? Notice pte_update() checks (arch/powerpc/include/asm/pgtable-ppc64.h): 198 /* huge pages use the old page table lock */ 199 if (!huge) 200 assert_pte_locked(mm, addr); But unlike pte_update() ptep_set_access_flags() has no way of knowing it's been called from huge_ptep_set_access_flags(). So my guess is we either remove the call to assert_pte_locked() in there, or have assert_pte_locked() check whether it's being called for a huge pte. cheers
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev