Re: pipe/page fault oddness.

2014-10-08 Thread Aneesh Kumar K.V
Linus Torvalds writes: > On Mon, Oct 6, 2014 at 3:18 PM, Aneesh Kumar K.V > wrote: >> >> Are we still looking at these options ? I could look at implementing the >> first option which will also enable us to free up one pte bit. > > We definitely are. If you can test my patch (with the small foll

Re: pipe/page fault oddness.

2014-10-07 Thread Linus Torvalds
On Mon, Oct 6, 2014 at 3:18 PM, Aneesh Kumar K.V wrote: > > Are we still looking at these options ? I could look at implementing the > first option which will also enable us to free up one pte bit. We definitely are. If you can test my patch (with the small follow-up fix), and do the necessary ch

Re: pipe/page fault oddness.

2014-10-06 Thread Aneesh Kumar K.V
Mel Gorman writes: > On Wed, Oct 01, 2014 at 09:18:25AM -0700, Linus Torvalds wrote: >> On Wed, Oct 1, 2014 at 9:01 AM, Linus Torvalds >> wrote: >> > >> > We need to get rid of it, and just make it the same as pte_protnone(). >> > And then the real protnone is in the vma flags, and if you actual

Re: pipe/page fault oddness.

2014-10-03 Thread Sasha Levin
On 10/03/2014 11:58 AM, Dave Jones wrote: > On Fri, Oct 03, 2014 at 08:43:57AM -0700, Linus Torvalds wrote: > > On Thu, Oct 2, 2014 at 10:00 PM, Sasha Levin > wrote: > > > > > > For the record, I tweaked the environment to put some more pressure on > the > > > scheduler and found out what br

Re: pipe/page fault oddness.

2014-10-03 Thread Dave Jones
On Fri, Oct 03, 2014 at 08:43:57AM -0700, Linus Torvalds wrote: > On Thu, Oct 2, 2014 at 10:00 PM, Sasha Levin wrote: > > > > For the record, I tweaked the environment to put some more pressure on the > > scheduler and found out what broke (which is not related to this thread at > > all). >

Re: pipe/page fault oddness.

2014-10-03 Thread Linus Torvalds
On Thu, Oct 2, 2014 at 10:00 PM, Sasha Levin wrote: > > For the record, I tweaked the environment to put some more pressure on the > scheduler and found out what broke (which is not related to this thread at > all). Ok. It's probably still worth testing Mel's patches, since that's what goes into

Re: pipe/page fault oddness.

2014-10-02 Thread Sasha Levin
On 10/02/2014 12:10 PM, Linus Torvalds wrote: > On Thu, Oct 2, 2014 at 8:04 AM, Sasha Levin wrote: >> > >> > I have a new one for you. I know it doesn't say "numa" anywhere, but I >> > haven't ever seen that trace before so I'll just go ahead and blame it >> > on your patch... > Fair enough, but t

Re: pipe/page fault oddness.

2014-10-02 Thread Kirill A. Shutemov
On Thu, Oct 02, 2014 at 09:01:38AM -0700, Linus Torvalds wrote: > On Thu, Oct 2, 2014 at 7:25 AM, Kirill A. Shutemov > wrote: > > > > I don't see what prevents the code to make zero page writable here. > > We need at least pmd = pmd_wrprotect(pmd) before set_pmd_at(); > > Do we? If it's the zero

Re: pipe/page fault oddness.

2014-10-02 Thread Linus Torvalds
On Thu, Oct 2, 2014 at 8:04 AM, Sasha Levin wrote: > > I have a new one for you. I know it doesn't say "numa" anywhere, but I > haven't ever seen that trace before so I'll just go ahead and blame it > on your patch... Fair enough, but the oops doesn't really give even a hint of what could be wron

Re: pipe/page fault oddness.

2014-10-02 Thread Linus Torvalds
On Thu, Oct 2, 2014 at 7:25 AM, Kirill A. Shutemov wrote: > > I don't see what prevents the code to make zero page writable here. > We need at least pmd = pmd_wrprotect(pmd) before set_pmd_at(); Do we? If it's the zero page, it had better be an anonymous mapping, and vm_page_prot had better not b

Re: pipe/page fault oddness.

2014-10-02 Thread Linus Torvalds
On Thu, Oct 2, 2014 at 1:47 AM, Hugh Dickins wrote: > > I hesitate to admit, I still don't see it: please illuminate further. No, your'e looking at what I was looking. > We're talking about the loop in __split_huge_page_map(), where it does Yes. > entry = mk_pte(page +

Re: pipe/page fault oddness.

2014-10-02 Thread Sasha Levin
On 10/01/2014 06:42 PM, Linus Torvalds wrote: > On Wed, Oct 1, 2014 at 3:08 PM, Sasha Levin wrote: >> > >> > I've tried this patch on the same configuration that was triggering >> > the VM_BUG_ON that Hugh mentioned previously. Surprisingly enough it >> > ran fine for ~20 minutes before exploding

Re: pipe/page fault oddness.

2014-10-02 Thread Sasha Levin
On 10/02/2014 04:03 AM, Chuck Ebbert wrote: > On Wed, 01 Oct 2014 23:32:15 -0400 > Sasha Levin wrote: > >> On 10/01/2014 06:28 PM, Chuck Ebbert wrote: >>> On Wed, 01 Oct 2014 18:08:30 -0400 >>> Sasha Levin wrote: >>> > On 10/01/2014 04:20 PM, Linus Torvalds wrote: >>> So I'm really sendi

Re: pipe/page fault oddness.

2014-10-02 Thread Kirill A. Shutemov
On Wed, Oct 01, 2014 at 03:42:53PM -0700, Linus Torvalds wrote: > On Wed, Oct 1, 2014 at 3:08 PM, Sasha Levin wrote: > > > > I've tried this patch on the same configuration that was triggering > > the VM_BUG_ON that Hugh mentioned previously. Surprisingly enough it > > ran fine for ~20 minutes bef

Re: pipe/page fault oddness.

2014-10-02 Thread Mel Gorman
On Wed, Oct 01, 2014 at 09:18:25AM -0700, Linus Torvalds wrote: > On Wed, Oct 1, 2014 at 9:01 AM, Linus Torvalds > wrote: > > > > We need to get rid of it, and just make it the same as pte_protnone(). > > And then the real protnone is in the vma flags, and if you actually > > ever get to a pte tha

Re: pipe/page fault oddness.

2014-10-02 Thread Hugh Dickins
On Wed, 1 Oct 2014, Linus Torvalds wrote: > On Wed, Oct 1, 2014 at 1:19 AM, Hugh Dickins wrote: > > Can we please just get rid of _PAGE_NUMA. There is no excuse for it. I'm no lover of _PAGE_NUMA, and hope that it can be simplified away as you outline. What we have in 3.16+3.17 is already an at

Re: pipe/page fault oddness.

2014-10-02 Thread Peter Zijlstra
On Wed, Oct 01, 2014 at 01:29:04PM -0400, Rik van Riel wrote: > On 10/01/2014 12:18 PM, Linus Torvalds wrote: > > > Seriously, why can't we just do this, and throw away all the crap that > > is "numa special case". This would make all the random games in > > change_pte_range() just go away entirel

Re: pipe/page fault oddness.

2014-10-02 Thread Chuck Ebbert
On Wed, 01 Oct 2014 23:32:15 -0400 Sasha Levin wrote: > On 10/01/2014 06:28 PM, Chuck Ebbert wrote: > > On Wed, 01 Oct 2014 18:08:30 -0400 > > Sasha Levin wrote: > > > >> > On 10/01/2014 04:20 PM, Linus Torvalds wrote: > >>> > > So I'm really sending this patch out in the hope that it will get

Re: pipe/page fault oddness.

2014-10-01 Thread Sasha Levin
On 10/01/2014 06:28 PM, Chuck Ebbert wrote: > On Wed, 01 Oct 2014 18:08:30 -0400 > Sasha Levin wrote: > >> > On 10/01/2014 04:20 PM, Linus Torvalds wrote: >>> > > So I'm really sending this patch out in the hope that it will get >>> > > comments, fixup and possibly even testing by people who actu

Re: pipe/page fault oddness.

2014-10-01 Thread Linus Torvalds
On Wed, Oct 1, 2014 at 3:08 PM, Sasha Levin wrote: > > I've tried this patch on the same configuration that was triggering > the VM_BUG_ON that Hugh mentioned previously. Surprisingly enough it > ran fine for ~20 minutes before exploding with: Well, that's somewhat encouraging. I didn't expect it

Re: pipe/page fault oddness.

2014-10-01 Thread Chuck Ebbert
On Wed, 01 Oct 2014 18:08:30 -0400 Sasha Levin wrote: > On 10/01/2014 04:20 PM, Linus Torvalds wrote: > > So I'm really sending this patch out in the hope that it will get > > comments, fixup and possibly even testing by people who actually know > > the NUMA balancing code. Rik? Anybody? > > Hi

Re: pipe/page fault oddness.

2014-10-01 Thread Sasha Levin
On 10/01/2014 04:20 PM, Linus Torvalds wrote: > So I'm really sending this patch out in the hope that it will get > comments, fixup and possibly even testing by people who actually know > the NUMA balancing code. Rik? Anybody? Hi Linus, I've tried this patch on the same configuration that was tr

Re: pipe/page fault oddness.

2014-10-01 Thread Rik van Riel
On 10/01/2014 04:20 PM, Linus Torvalds wrote: > Now, I'll be honest: this patch *migth* just work, but I expect it to > have some stupid problem. It compiles. I haven't even dared boot it, > much less try any numa benchmarks that woudln't show anything sane on > my machine anyway. > > So I'm real

Re: pipe/page fault oddness.

2014-10-01 Thread Linus Torvalds
On Wed, Oct 1, 2014 at 9:18 AM, Linus Torvalds wrote: > > So I'd really suggest we do exactly that. Get rid of "pte_numa()" > entirely, get rid of "_PAGE_[BIT_]NUMA" entirely, and instead add a > "pte_protnone()" helper to check for the "protnone" case (which on x86 > is testing the _PAGE_PROTNONE

Re: pipe/page fault oddness.

2014-10-01 Thread Rik van Riel
On 10/01/2014 12:18 PM, Linus Torvalds wrote: > Seriously, why can't we just do this, and throw away all the crap that > is "numa special case". This would make all the random games in > change_pte_range() just go away entirely, because the whole NUMA thing > really wouldn't be a special case for

Re: pipe/page fault oddness.

2014-10-01 Thread Linus Torvalds
On Wed, Oct 1, 2014 at 9:01 AM, Linus Torvalds wrote: > > We need to get rid of it, and just make it the same as pte_protnone(). > And then the real protnone is in the vma flags, and if you actually > ever get to a pte that is marked protnone, you know it's a numa page. So I'd really suggest we d

Re: pipe/page fault oddness.

2014-10-01 Thread Linus Torvalds
On Wed, Oct 1, 2014 at 1:19 AM, Hugh Dickins wrote: > > Irrelevance follows... Maybe not irrelevant. > There *appears* to be a risk of hitting the VM_BUG_ON, or with no > VM_BUG_ON (as in 3.17-rc) pte_mknuma proceeding to add _PAGE_NUMA > to _PAGE_PROTNONE - making the pte then fail the pte_numa

Re: pipe/page fault oddness.

2014-10-01 Thread Hugh Dickins
On Tue, 30 Sep 2014, Linus Torvalds wrote: > On Tue, Sep 30, 2014 at 11:20 AM, Dave Jones wrote: > > > > page_fault_kernel:address=__per_cpu_end ip=copy_page_to_iter > > error_code=0x2 > > Interesting. "error_code" in particular. The value "2" means that the > CPU thinks that the page is not

Re: pipe/page fault oddness.

2014-09-30 Thread Linus Torvalds
On Tue, Sep 30, 2014 at 11:20 AM, Dave Jones wrote: > > page_fault_kernel:address=__per_cpu_end ip=copy_page_to_iter > error_code=0x2 Interesting. "error_code" in particular. The value "2" means that the CPU thinks that the page is not present (bit zero is clear). (That "address" is useless

Re: pipe/page fault oddness.

2014-09-30 Thread Dave Jones
On Tue, Sep 30, 2014 at 09:46:45AM -0700, Linus Torvalds wrote: > On Tue, Sep 30, 2014 at 9:40 AM, Dave Jones wrote: > > > > ah, echo 0 > tracing_on isn't enough, I had to 0 out set_ftrace_pid too. > > Never mind. Subsequent traces all looking the same, minus the tracing > > junk though. >

Re: pipe/page fault oddness.

2014-09-30 Thread Linus Torvalds
On Tue, Sep 30, 2014 at 9:40 AM, Dave Jones wrote: > > ah, echo 0 > tracing_on isn't enough, I had to 0 out set_ftrace_pid too. > Never mind. Subsequent traces all looking the same, minus the tracing > junk though. Yeah, looks like that copy_page_to_iter+0x3b3 is always there. > Is there some w

Re: pipe/page fault oddness.

2014-09-30 Thread Dave Jones
On Tue, Sep 30, 2014 at 12:22:01PM -0400, Dave Jones wrote: > [] ? trace_graph_entry+0x123/0x250 > [] ? trace_buffer_lock_reserve+0x1e/0x60 > [] ? handle_mm_fault+0x3a7/0xcd0 > [] ? trace_hardirqs_on+0xd/0x10 > [] ? trace_graph_entry+0x108/0x250 > [] ? __do_page_fault+0x234/0x600 > [] ? pr

Re: pipe/page fault oddness.

2014-09-30 Thread Linus Torvalds
On Tue, Sep 30, 2014 at 9:03 AM, Rik van Riel wrote: > > On the other hand, do_wp_page does not seem to do a tlb flush when > the old page is reused, so CPUs do get rid of inappropriate TLB > entries. We would have noticed do_wp_page not working right :) Hmm? do_wp_page() uses the same ptep_set_a

Re: pipe/page fault oddness.

2014-09-30 Thread Dave Jones
On Tue, Sep 30, 2014 at 09:10:26AM -0700, Linus Torvalds wrote: > On Tue, Sep 30, 2014 at 9:05 AM, Dave Jones wrote: > > > > I left it spinning overnight in case someone wanted me to probe it > > further, so I haven't tried reproducing it yet. It took ~12 hours > > yesterday before it got in

Re: pipe/page fault oddness.

2014-09-30 Thread Linus Torvalds
On Tue, Sep 30, 2014 at 9:05 AM, Dave Jones wrote: > > I left it spinning overnight in case someone wanted me to probe it > further, so I haven't tried reproducing it yet. It took ~12 hours > yesterday before it got in that state. I'll restart it, and tell it > to only use pipe fd's, which might

Re: pipe/page fault oddness.

2014-09-30 Thread Dave Jones
On Tue, Sep 30, 2014 at 12:03:06PM -0400, Rik van Riel wrote: > > What kind of CPU is the problematic machine? There was some > > question about just how architectural the whole "TLB entry causing > > a page fault gets invalidated automatically" really is. > > Intel people told me at the tim

Re: pipe/page fault oddness.

2014-09-30 Thread Dave Jones
On Tue, Sep 30, 2014 at 08:52:08AM -0700, Linus Torvalds wrote: > So if it's looping on that fault, what seems to happen is that the > page fault keeps happening. > > Can you recreate this? Because if you can, please try to revert commit > e4a1cc56e4d7 ("x86: mm: drop TLB flush from ptep_set

Re: pipe/page fault oddness.

2014-09-30 Thread Rik van Riel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 09/30/2014 11:52 AM, Linus Torvalds wrote: > On Mon, Sep 29, 2014 at 9:54 PM, Linus Torvalds > wrote: >> >> Odd. The 0x3b3 offset seems to be the single-byte write of zero, >> which is just the initial probe (aka >> "fault_in_pages_writeable()").

Re: pipe/page fault oddness.

2014-09-30 Thread Linus Torvalds
On Mon, Sep 29, 2014 at 9:54 PM, Linus Torvalds wrote: > > Odd. The 0x3b3 offset seems to be the single-byte write of zero, which is > just the initial probe (aka "fault_in_pages_writeable()"). > > How *that* could loop, I have no idea. Unless the exception table is broken. > I'll take another loo

Re: pipe/page fault oddness.

2014-09-29 Thread Al Viro
On Mon, Sep 29, 2014 at 09:27:09PM -0700, Linus Torvalds wrote: > On Mon, Sep 29, 2014 at 8:33 PM, Dave Jones wrote: > > > > Looking at the dump, there's only one running trinity child, > > with all the others blocking on it. > > > > trinity-c49 R running task12856 19464 7633 0x0004

Re: pipe/page fault oddness.

2014-09-29 Thread Dave Jones
On Mon, Sep 29, 2014 at 09:27:09PM -0700, Linus Torvalds wrote: > On Mon, Sep 29, 2014 at 8:33 PM, Dave Jones wrote: > > > > Looking at the dump, there's only one running trinity child, > > with all the others blocking on it. > > > > trinity-c49 R running task12856 19464 7633 0x00

Re: pipe/page fault oddness.

2014-09-29 Thread Linus Torvalds
On Mon, Sep 29, 2014 at 8:33 PM, Dave Jones wrote: > > Looking at the dump, there's only one running trinity child, > with all the others blocking on it. > > trinity-c49 R running task12856 19464 7633 0x0004 > 8800a09bf960 0002 8800a09bf9f8 88021965 > 000

pipe/page fault oddness.

2014-09-29 Thread Dave Jones
My fuzz tester ground to a halt, with many child processes blocked on pipe_lock. sysrq-t output: http://codemonkey.org.uk/junk/pipe-lock-wtf.txt Looking at the dump, there's only one running trinity child, with all the others blocking on it. trinity-c49 R running task12856 19464 7633