Linus Torvalds writes:
> On Mon, Oct 6, 2014 at 3:18 PM, Aneesh Kumar K.V
> wrote:
>>
>> Are we still looking at these options ? I could look at implementing the
>> first option which will also enable us to free up one pte bit.
>
> We definitely are. If you can test my patch (with the small foll
On Mon, Oct 6, 2014 at 3:18 PM, Aneesh Kumar K.V
wrote:
>
> Are we still looking at these options ? I could look at implementing the
> first option which will also enable us to free up one pte bit.
We definitely are. If you can test my patch (with the small follow-up
fix), and do the necessary ch
Mel Gorman writes:
> On Wed, Oct 01, 2014 at 09:18:25AM -0700, Linus Torvalds wrote:
>> On Wed, Oct 1, 2014 at 9:01 AM, Linus Torvalds
>> wrote:
>> >
>> > We need to get rid of it, and just make it the same as pte_protnone().
>> > And then the real protnone is in the vma flags, and if you actual
On 10/03/2014 11:58 AM, Dave Jones wrote:
> On Fri, Oct 03, 2014 at 08:43:57AM -0700, Linus Torvalds wrote:
> > On Thu, Oct 2, 2014 at 10:00 PM, Sasha Levin
> wrote:
> > >
> > > For the record, I tweaked the environment to put some more pressure on
> the
> > > scheduler and found out what br
On Fri, Oct 03, 2014 at 08:43:57AM -0700, Linus Torvalds wrote:
> On Thu, Oct 2, 2014 at 10:00 PM, Sasha Levin wrote:
> >
> > For the record, I tweaked the environment to put some more pressure on the
> > scheduler and found out what broke (which is not related to this thread at
> > all).
>
On Thu, Oct 2, 2014 at 10:00 PM, Sasha Levin wrote:
>
> For the record, I tweaked the environment to put some more pressure on the
> scheduler and found out what broke (which is not related to this thread at
> all).
Ok. It's probably still worth testing Mel's patches, since that's what
goes into
On 10/02/2014 12:10 PM, Linus Torvalds wrote:
> On Thu, Oct 2, 2014 at 8:04 AM, Sasha Levin wrote:
>> >
>> > I have a new one for you. I know it doesn't say "numa" anywhere, but I
>> > haven't ever seen that trace before so I'll just go ahead and blame it
>> > on your patch...
> Fair enough, but t
On Thu, Oct 02, 2014 at 09:01:38AM -0700, Linus Torvalds wrote:
> On Thu, Oct 2, 2014 at 7:25 AM, Kirill A. Shutemov
> wrote:
> >
> > I don't see what prevents the code to make zero page writable here.
> > We need at least pmd = pmd_wrprotect(pmd) before set_pmd_at();
>
> Do we? If it's the zero
On Thu, Oct 2, 2014 at 8:04 AM, Sasha Levin wrote:
>
> I have a new one for you. I know it doesn't say "numa" anywhere, but I
> haven't ever seen that trace before so I'll just go ahead and blame it
> on your patch...
Fair enough, but the oops doesn't really give even a hint of what
could be wron
On Thu, Oct 2, 2014 at 7:25 AM, Kirill A. Shutemov wrote:
>
> I don't see what prevents the code to make zero page writable here.
> We need at least pmd = pmd_wrprotect(pmd) before set_pmd_at();
Do we? If it's the zero page, it had better be an anonymous mapping,
and vm_page_prot had better not b
On Thu, Oct 2, 2014 at 1:47 AM, Hugh Dickins wrote:
>
> I hesitate to admit, I still don't see it: please illuminate further.
No, your'e looking at what I was looking.
> We're talking about the loop in __split_huge_page_map(), where it does
Yes.
> entry = mk_pte(page +
On 10/01/2014 06:42 PM, Linus Torvalds wrote:
> On Wed, Oct 1, 2014 at 3:08 PM, Sasha Levin wrote:
>> >
>> > I've tried this patch on the same configuration that was triggering
>> > the VM_BUG_ON that Hugh mentioned previously. Surprisingly enough it
>> > ran fine for ~20 minutes before exploding
On 10/02/2014 04:03 AM, Chuck Ebbert wrote:
> On Wed, 01 Oct 2014 23:32:15 -0400
> Sasha Levin wrote:
>
>> On 10/01/2014 06:28 PM, Chuck Ebbert wrote:
>>> On Wed, 01 Oct 2014 18:08:30 -0400
>>> Sasha Levin wrote:
>>>
> On 10/01/2014 04:20 PM, Linus Torvalds wrote:
>>> So I'm really sendi
On Wed, Oct 01, 2014 at 03:42:53PM -0700, Linus Torvalds wrote:
> On Wed, Oct 1, 2014 at 3:08 PM, Sasha Levin wrote:
> >
> > I've tried this patch on the same configuration that was triggering
> > the VM_BUG_ON that Hugh mentioned previously. Surprisingly enough it
> > ran fine for ~20 minutes bef
On Wed, Oct 01, 2014 at 09:18:25AM -0700, Linus Torvalds wrote:
> On Wed, Oct 1, 2014 at 9:01 AM, Linus Torvalds
> wrote:
> >
> > We need to get rid of it, and just make it the same as pte_protnone().
> > And then the real protnone is in the vma flags, and if you actually
> > ever get to a pte tha
On Wed, 1 Oct 2014, Linus Torvalds wrote:
> On Wed, Oct 1, 2014 at 1:19 AM, Hugh Dickins wrote:
>
> Can we please just get rid of _PAGE_NUMA. There is no excuse for it.
I'm no lover of _PAGE_NUMA, and hope that it can be simplified away
as you outline. What we have in 3.16+3.17 is already an at
On Wed, Oct 01, 2014 at 01:29:04PM -0400, Rik van Riel wrote:
> On 10/01/2014 12:18 PM, Linus Torvalds wrote:
>
> > Seriously, why can't we just do this, and throw away all the crap that
> > is "numa special case". This would make all the random games in
> > change_pte_range() just go away entirel
On Wed, 01 Oct 2014 23:32:15 -0400
Sasha Levin wrote:
> On 10/01/2014 06:28 PM, Chuck Ebbert wrote:
> > On Wed, 01 Oct 2014 18:08:30 -0400
> > Sasha Levin wrote:
> >
> >> > On 10/01/2014 04:20 PM, Linus Torvalds wrote:
> >>> > > So I'm really sending this patch out in the hope that it will get
On 10/01/2014 06:28 PM, Chuck Ebbert wrote:
> On Wed, 01 Oct 2014 18:08:30 -0400
> Sasha Levin wrote:
>
>> > On 10/01/2014 04:20 PM, Linus Torvalds wrote:
>>> > > So I'm really sending this patch out in the hope that it will get
>>> > > comments, fixup and possibly even testing by people who actu
On Wed, Oct 1, 2014 at 3:08 PM, Sasha Levin wrote:
>
> I've tried this patch on the same configuration that was triggering
> the VM_BUG_ON that Hugh mentioned previously. Surprisingly enough it
> ran fine for ~20 minutes before exploding with:
Well, that's somewhat encouraging. I didn't expect it
On Wed, 01 Oct 2014 18:08:30 -0400
Sasha Levin wrote:
> On 10/01/2014 04:20 PM, Linus Torvalds wrote:
> > So I'm really sending this patch out in the hope that it will get
> > comments, fixup and possibly even testing by people who actually know
> > the NUMA balancing code. Rik? Anybody?
>
> Hi
On 10/01/2014 04:20 PM, Linus Torvalds wrote:
> So I'm really sending this patch out in the hope that it will get
> comments, fixup and possibly even testing by people who actually know
> the NUMA balancing code. Rik? Anybody?
Hi Linus,
I've tried this patch on the same configuration that was tr
On 10/01/2014 04:20 PM, Linus Torvalds wrote:
> Now, I'll be honest: this patch *migth* just work, but I expect it to
> have some stupid problem. It compiles. I haven't even dared boot it,
> much less try any numa benchmarks that woudln't show anything sane on
> my machine anyway.
>
> So I'm real
On Wed, Oct 1, 2014 at 9:18 AM, Linus Torvalds
wrote:
>
> So I'd really suggest we do exactly that. Get rid of "pte_numa()"
> entirely, get rid of "_PAGE_[BIT_]NUMA" entirely, and instead add a
> "pte_protnone()" helper to check for the "protnone" case (which on x86
> is testing the _PAGE_PROTNONE
On 10/01/2014 12:18 PM, Linus Torvalds wrote:
> Seriously, why can't we just do this, and throw away all the crap that
> is "numa special case". This would make all the random games in
> change_pte_range() just go away entirely, because the whole NUMA thing
> really wouldn't be a special case for
On Wed, Oct 1, 2014 at 9:01 AM, Linus Torvalds
wrote:
>
> We need to get rid of it, and just make it the same as pte_protnone().
> And then the real protnone is in the vma flags, and if you actually
> ever get to a pte that is marked protnone, you know it's a numa page.
So I'd really suggest we d
On Wed, Oct 1, 2014 at 1:19 AM, Hugh Dickins wrote:
>
> Irrelevance follows...
Maybe not irrelevant.
> There *appears* to be a risk of hitting the VM_BUG_ON, or with no
> VM_BUG_ON (as in 3.17-rc) pte_mknuma proceeding to add _PAGE_NUMA
> to _PAGE_PROTNONE - making the pte then fail the pte_numa
On Tue, 30 Sep 2014, Linus Torvalds wrote:
> On Tue, Sep 30, 2014 at 11:20 AM, Dave Jones wrote:
> >
> > page_fault_kernel:address=__per_cpu_end ip=copy_page_to_iter
> > error_code=0x2
>
> Interesting. "error_code" in particular. The value "2" means that the
> CPU thinks that the page is not
On Tue, Sep 30, 2014 at 11:20 AM, Dave Jones wrote:
>
> page_fault_kernel:address=__per_cpu_end ip=copy_page_to_iter
> error_code=0x2
Interesting. "error_code" in particular. The value "2" means that the
CPU thinks that the page is not present (bit zero is clear).
(That "address" is useless
On Tue, Sep 30, 2014 at 09:46:45AM -0700, Linus Torvalds wrote:
> On Tue, Sep 30, 2014 at 9:40 AM, Dave Jones wrote:
> >
> > ah, echo 0 > tracing_on isn't enough, I had to 0 out set_ftrace_pid too.
> > Never mind. Subsequent traces all looking the same, minus the tracing
> > junk though.
>
On Tue, Sep 30, 2014 at 9:40 AM, Dave Jones wrote:
>
> ah, echo 0 > tracing_on isn't enough, I had to 0 out set_ftrace_pid too.
> Never mind. Subsequent traces all looking the same, minus the tracing
> junk though.
Yeah, looks like that copy_page_to_iter+0x3b3 is always there.
> Is there some w
On Tue, Sep 30, 2014 at 12:22:01PM -0400, Dave Jones wrote:
> [] ? trace_graph_entry+0x123/0x250
> [] ? trace_buffer_lock_reserve+0x1e/0x60
> [] ? handle_mm_fault+0x3a7/0xcd0
> [] ? trace_hardirqs_on+0xd/0x10
> [] ? trace_graph_entry+0x108/0x250
> [] ? __do_page_fault+0x234/0x600
> [] ? pr
On Tue, Sep 30, 2014 at 9:03 AM, Rik van Riel wrote:
>
> On the other hand, do_wp_page does not seem to do a tlb flush when
> the old page is reused, so CPUs do get rid of inappropriate TLB
> entries. We would have noticed do_wp_page not working right :)
Hmm? do_wp_page() uses the same ptep_set_a
On Tue, Sep 30, 2014 at 09:10:26AM -0700, Linus Torvalds wrote:
> On Tue, Sep 30, 2014 at 9:05 AM, Dave Jones wrote:
> >
> > I left it spinning overnight in case someone wanted me to probe it
> > further, so I haven't tried reproducing it yet. It took ~12 hours
> > yesterday before it got in
On Tue, Sep 30, 2014 at 9:05 AM, Dave Jones wrote:
>
> I left it spinning overnight in case someone wanted me to probe it
> further, so I haven't tried reproducing it yet. It took ~12 hours
> yesterday before it got in that state. I'll restart it, and tell it
> to only use pipe fd's, which might
On Tue, Sep 30, 2014 at 12:03:06PM -0400, Rik van Riel wrote:
> > What kind of CPU is the problematic machine? There was some
> > question about just how architectural the whole "TLB entry causing
> > a page fault gets invalidated automatically" really is.
>
> Intel people told me at the tim
On Tue, Sep 30, 2014 at 08:52:08AM -0700, Linus Torvalds wrote:
> So if it's looping on that fault, what seems to happen is that the
> page fault keeps happening.
>
> Can you recreate this? Because if you can, please try to revert commit
> e4a1cc56e4d7 ("x86: mm: drop TLB flush from ptep_set
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 09/30/2014 11:52 AM, Linus Torvalds wrote:
> On Mon, Sep 29, 2014 at 9:54 PM, Linus Torvalds
> wrote:
>>
>> Odd. The 0x3b3 offset seems to be the single-byte write of zero,
>> which is just the initial probe (aka
>> "fault_in_pages_writeable()").
On Mon, Sep 29, 2014 at 9:54 PM, Linus Torvalds
wrote:
>
> Odd. The 0x3b3 offset seems to be the single-byte write of zero, which is
> just the initial probe (aka "fault_in_pages_writeable()").
>
> How *that* could loop, I have no idea. Unless the exception table is broken.
> I'll take another loo
On Mon, Sep 29, 2014 at 09:27:09PM -0700, Linus Torvalds wrote:
> On Mon, Sep 29, 2014 at 8:33 PM, Dave Jones wrote:
> >
> > Looking at the dump, there's only one running trinity child,
> > with all the others blocking on it.
> >
> > trinity-c49 R running task12856 19464 7633 0x0004
On Mon, Sep 29, 2014 at 09:27:09PM -0700, Linus Torvalds wrote:
> On Mon, Sep 29, 2014 at 8:33 PM, Dave Jones wrote:
> >
> > Looking at the dump, there's only one running trinity child,
> > with all the others blocking on it.
> >
> > trinity-c49 R running task12856 19464 7633 0x00
On Mon, Sep 29, 2014 at 8:33 PM, Dave Jones wrote:
>
> Looking at the dump, there's only one running trinity child,
> with all the others blocking on it.
>
> trinity-c49 R running task12856 19464 7633 0x0004
> 8800a09bf960 0002 8800a09bf9f8 88021965
> 000
My fuzz tester ground to a halt, with many child processes blocked
on pipe_lock. sysrq-t output: http://codemonkey.org.uk/junk/pipe-lock-wtf.txt
Looking at the dump, there's only one running trinity child,
with all the others blocking on it.
trinity-c49 R running task12856 19464 7633
43 matches
Mail list logo