Re: [PATCH 0/1] mm: restore full accuracy in COW page reuse

2021-01-09 Thread Andrea Arcangeli
On Sat, Jan 09, 2021 at 05:37:09PM -0800, Linus Torvalds wrote: > On Sat, Jan 9, 2021 at 5:19 PM Linus Torvalds > wrote: > > > > And no, I didn't make the UFFDIO_WRITEPROTECT code take the mmap_sem > > for writing. For whoever wants to look at that, it's > > mwriteprotect_range() in mm/userfaultfd

Re: [PATCH 1/1] mm: restore full accuracy in COW page reuse

2021-01-09 Thread Andrea Arcangeli
Hello, On Sat, Jan 09, 2021 at 07:44:35PM -0500, Andrea Arcangeli wrote: > allowing a child to corrupt memory in the parent. That's a problem > that could happen not-maliciously too. So the scenario described I updated the above partly quoted sentence since in the previous version it

Re: [PATCH 0/1] mm: restore full accuracy in COW page reuse

2021-01-09 Thread Andrea Arcangeli
Hello Linus, On Sat, Jan 09, 2021 at 05:19:51PM -0800, Linus Torvalds wrote: > +#define is_cow_mapping(flags) (((flags) & (VM_SHARED | VM_MAYWRITE)) == > VM_MAYWRITE) > + > +static inline bool pte_is_pinned(struct vm_area_struct *vma, unsigned long > addr, pte_t pte) > +{ > + struct page *pa

[PATCH 0/1] mm: restore full accuracy in COW page reuse

2021-01-09 Thread Andrea Arcangeli
ot;); } else printf("memory corruption detected\n"); } skip_memset = !skip_memset; if (!skip_memset) memset(mem, 0xff, HARDBLKSIZE); } return 0; } And

[PATCH 1/1] mm: restore full accuracy in COW page reuse

2021-01-09 Thread Andrea Arcangeli
splice itself, and only at a second stage in the COW code. Link: https://lkml.kernel.org/r/20210107200402.31095-1-aarca...@redhat.com Cc: sta...@kernel.org Fixes: 09854ba94c6a ("mm: do_wp_page() simplification") Signed-off-by: Andrea Arcangeli --- include/linux/ksm.h | 7 ++ mm/

Re: [PATCH 0/2] page_count can't be used to decide when wp_page_copy

2021-01-08 Thread Andrea Arcangeli
Hello Jason, On Fri, Jan 08, 2021 at 08:42:55PM -0400, Jason Gunthorpe wrote: > There is already a patch series floating about to do exactly that for > FOLL_LONGTERM pins based on the existing code in GUP for CMA migration Sounds great. > The ship sailed on this a decade ago, it is completely in

Re: [PATCH 2/2] mm: soft_dirty: userfaultfd: introduce wrprotect_tlb_flush_pending

2021-01-08 Thread Andrea Arcangeli
On Fri, Jan 08, 2021 at 11:25:21AM -0800, Linus Torvalds wrote: > On Fri, Jan 8, 2021 at 9:53 AM Andrea Arcangeli wrote: > > > > Do you intend to eventually fix the zygote vmsplice case or not? > > Because in current upstream it's not fixed currently using the >

Re: [PATCH] x86/vm86/32: Remove VM86_SCREEN_BITMAP support

2021-01-08 Thread Andrea Arcangeli
gt; writable. > > I can't find any users at all of this mechanism, so just remove it. Reviewed-by: Andrea Arcangeli

Re: [PATCH 0/2] page_count can't be used to decide when wp_page_copy

2021-01-08 Thread Andrea Arcangeli
On Fri, Jan 08, 2021 at 10:31:24AM -0800, Andy Lutomirski wrote: > Can we just remove vmsplice() support? We could make it do a normal The single case I've seen vmsplice used so far, that was really cool is localhost live migration of qemu. However despite really cool, it wasn't merged in the end

Re: [PATCH 0/2] page_count can't be used to decide when wp_page_copy

2021-01-08 Thread Andrea Arcangeli
On Fri, Jan 08, 2021 at 02:19:45PM -0400, Jason Gunthorpe wrote: > On Fri, Jan 08, 2021 at 12:00:36PM -0500, Andrea Arcangeli wrote: > > > The majority cannot be converted to notifiers because they are DMA > > > based. Every one of those is an ABI for something, and does n

Re: [PATCH 2/2] mm: soft_dirty: userfaultfd: introduce wrprotect_tlb_flush_pending

2021-01-08 Thread Andrea Arcangeli
On Fri, Jan 08, 2021 at 09:39:56AM -0800, Linus Torvalds wrote: > page_count() is simply the right and efficient thing to do. > > You talk about all these theoretical inefficiencies for cases like > zygote and page pinning, which have never ever been seen except as a > possible attack vector. Do

Re: [PATCH 0/2] page_count can't be used to decide when wp_page_copy

2021-01-08 Thread Andrea Arcangeli
On Fri, Jan 08, 2021 at 09:36:49AM -0400, Jason Gunthorpe wrote: > On Thu, Jan 07, 2021 at 04:45:33PM -0500, Andrea Arcangeli wrote: > > On Thu, Jan 07, 2021 at 04:25:25PM -0400, Jason Gunthorpe wrote: > > > On Thu, Jan 07, 2021 at 03:04:00PM -0500, Andrea Arcangeli wrote: >

Re: [PATCH 2/2] mm: soft_dirty: userfaultfd: introduce wrprotect_tlb_flush_pending

2021-01-08 Thread Andrea Arcangeli
Hello everyone, On Fri, Jan 08, 2021 at 12:48:16PM +, Will Deacon wrote: > On Thu, Jan 07, 2021 at 04:25:54PM -0800, Linus Torvalds wrote: > > Please. Why is the correct patch not the attached one (apart from the > > obvious fact that I haven't tested it and maybe just screwed up > > completel

Re: [PATCH 2/2] mm: soft_dirty: userfaultfd: introduce wrprotect_tlb_flush_pending

2021-01-07 Thread Andrea Arcangeli
On Thu, Jan 07, 2021 at 02:51:24PM -0800, Linus Torvalds wrote: > Ho humm. I had obviously not looked very much at that code. I had done > a quick git grep, but now that I look closer, it *does* get the > mmap_sem for writing, but only for that VM_SOFTDIRTY bit clearing, and > then it does a mmap_w

Re: [PATCH 2/2] mm: soft_dirty: userfaultfd: introduce wrprotect_tlb_flush_pending

2021-01-07 Thread Andrea Arcangeli
On Thu, Jan 07, 2021 at 02:42:17PM -0800, Linus Torvalds wrote: > On Thu, Jan 7, 2021 at 2:31 PM Andrea Arcangeli wrote: > > > > Random memory corruption will still silently materialize as result of > > the speculative lookups in the above scenario. > > Explain.

Re: [PATCH 0/2] page_count can't be used to decide when wp_page_copy

2021-01-07 Thread Andrea Arcangeli
On Thu, Jan 07, 2021 at 02:17:50PM -0800, Linus Torvalds wrote: > So I think we can agree that even that softdirty case we can just > handle by "don't do that then". Absolutely. The question is if somebody was happily running clear_refs with a RDMA attached to the process, by the time they update

Re: [PATCH 2/2] mm: soft_dirty: userfaultfd: introduce wrprotect_tlb_flush_pending

2021-01-07 Thread Andrea Arcangeli
On Thu, Jan 07, 2021 at 01:29:43PM -0800, Linus Torvalds wrote: > On Thu, Jan 7, 2021 at 12:59 PM Andrea Arcangeli wrote: > > > > The problem is it's not even possible to detect reliably if there's > > really a long term GUP pin because of speculative pagecache look

Re: [PATCH 0/2] page_count can't be used to decide when wp_page_copy

2021-01-07 Thread Andrea Arcangeli
On Thu, Jan 07, 2021 at 01:05:19PM -0800, Linus Torvalds wrote: > I think those would very much be worth fixing, so that if > UFFDIO_WRITEPROTECT taking the mmapo_sem for writing causes problems, > we can _fix_ those problems. > > But I think it's entirely wrong to treat UFFDIO_WRITEPROTECT as > s

Re: [PATCH 0/2] page_count can't be used to decide when wp_page_copy

2021-01-07 Thread Andrea Arcangeli
On Thu, Jan 07, 2021 at 12:32:09PM -0800, Linus Torvalds wrote: > I think Andrea is blinded by his own love for UFFDIO: when I do a > debian codesearch for UFFDIO_WRITEPROTECT, all it finds is the kernel > and strace (and the qemu copies of the kernel headers). For the record, I feel obliged to re

Re: [PATCH 0/2] page_count can't be used to decide when wp_page_copy

2021-01-07 Thread Andrea Arcangeli
On Thu, Jan 07, 2021 at 04:25:25PM -0400, Jason Gunthorpe wrote: > On Thu, Jan 07, 2021 at 03:04:00PM -0500, Andrea Arcangeli wrote: > > > vmsplice syscall API is insecure allowing long term GUP PINs without > > privilege. > > Lots of places are relying on pin_user_pages

Re: [PATCH 2/2] mm: soft_dirty: userfaultfd: introduce wrprotect_tlb_flush_pending

2021-01-07 Thread Andrea Arcangeli
Hi Linus, On Thu, Jan 07, 2021 at 12:17:40PM -0800, Linus Torvalds wrote: > On Thu, Jan 7, 2021 at 12:04 PM Andrea Arcangeli wrote: > > > > However there are two cases that could wrprotecting exclusive anon > > pages with only the mmap_read_lock: > > I still think the

[PATCH 0/2] page_count can't be used to decide when wp_page_copy

2021-01-07 Thread Andrea Arcangeli
lly. So I tried to fix even clear_refs to cope with it, but this is only the tip of the icerbeg of what really breaks. So in short I contextually self-NAK 2/2 of this patchset and we need to somehow reverse 09854ba94c6aad7886996bfbee2530b3d8a7f4f4 instead. Thanks, Andrea Andrea Arcangeli (1): mm

[PATCH 2/2] mm: soft_dirty: userfaultfd: introduce wrprotect_tlb_flush_pending

2021-01-07 Thread Andrea Arcangeli
+1104+1667+1101+1365+913+1108) bounces: 71, mode: rnd racing ver read, page_nr 25241 memory corruption 6 7 After the commit the userland memory corruption is gone as expected. Cc: sta...@kernel.org Reported-by: Nadav Amit Suggested-by: Yu Zhao Signed-off-by: Andrea Arcangeli --- fs/pr

[PATCH 1/2] mm: proc: Invalidate TLB after clearing soft-dirty page state

2021-01-07 Thread Andrea Arcangeli
g the mmu_gather API altogether: managing both the 'tlb_flush_pending' flag on the 'mm_struct' and explicit TLB invalidation for the sort-dirty path, much like mprotect() does already. Fixes: 0758cd830494 ("asm-generic/tlb: avoid potential double flush") Signed-off-by: Wil

Re: [PATCH] mm/mmap: replace if (cond) BUG() with BUG_ON()

2021-01-07 Thread Andrea Arcangeli
On Thu, Jan 07, 2021 at 06:28:29PM +0100, Vlastimil Babka wrote: > On 1/6/21 9:18 PM, Hugh Dickins wrote: > > On Wed, 6 Jan 2021, Andrea Arcangeli wrote: > >> > >> I'd be surprised if the kernel can boot with BUG_ON() defined as "do > >> {}while

Re: [PATCH] mm/mmap: replace if (cond) BUG() with BUG_ON()

2021-01-06 Thread Andrea Arcangeli
Hello, On Wed, Jan 06, 2021 at 11:46:20AM -0800, Andrew Morton wrote: > On Tue, 5 Jan 2021 20:28:27 -0800 (PST) Hugh Dickins wrote: > > > Alex, please consider why the authors of these lines (whom you > > did not Cc) chose to write them without BUG_ON(): it has always > > been preferred practice

Re: [RFC PATCH v2 2/2] fs/task_mmu: acquire mmap_lock for write on soft-dirty cleanup

2021-01-05 Thread Andrea Arcangeli
On Tue, Jan 05, 2021 at 10:16:29PM +, Will Deacon wrote: > On Tue, Jan 05, 2021 at 09:22:51PM +, Nadav Amit wrote: > > > On Jan 5, 2021, at 12:39 PM, Andrea Arcangeli wrote: > > > > > > On Tue, Jan 05, 2021 at 07:26:43PM +, Nadav Amit wrote: >

Re: [RFC PATCH v2 2/2] fs/task_mmu: acquire mmap_lock for write on soft-dirty cleanup

2021-01-05 Thread Andrea Arcangeli
On Tue, Jan 05, 2021 at 09:22:51PM +, Nadav Amit wrote: > It is also about performance due to unwarranted TLB flushes. If there will be a problem switching to the wait_flush_pending() model suggested by Peter may not even require changes to the common code in memory.c since I'm thinking it may

Re: [RFC PATCH v2 1/2] mm/userfaultfd: fix memory corruption due to writeprotect

2021-01-05 Thread Andrea Arcangeli
On Tue, Jan 05, 2021 at 08:06:22PM +, Nadav Amit wrote: > I just thought that there might be some insinuation, as you mentioned VMware > by name. My response was half-jokingly and should have had a smiley to > prevent you from wasting your time on the explanation. No problem, actually I apprec

Re: [RFC PATCH v2 2/2] fs/task_mmu: acquire mmap_lock for write on soft-dirty cleanup

2021-01-05 Thread Andrea Arcangeli
On Tue, Jan 05, 2021 at 07:26:43PM +, Nadav Amit wrote: > > On Jan 5, 2021, at 10:20 AM, Andrea Arcangeli wrote: > > > > On Fri, Dec 25, 2020 at 01:25:29AM -0800, Nadav Amit wrote: > >> Fixes: 0f8975ec4db2 ("mm: soft-dirty bits for user memory changes >

Re: [RFC PATCH v2 1/2] mm/userfaultfd: fix memory corruption due to writeprotect

2021-01-05 Thread Andrea Arcangeli
On Tue, Jan 05, 2021 at 07:05:22PM +, Nadav Amit wrote: > > On Jan 5, 2021, at 10:45 AM, Andrea Arcangeli wrote: > > I just don't like to slow down a feature required in the future for > > implementing postcopy live snapshotting or other snapshots to userland > &g

Re: [RFC PATCH v2 1/2] mm/userfaultfd: fix memory corruption due to writeprotect

2021-01-05 Thread Andrea Arcangeli
On Tue, Jan 05, 2021 at 01:41:34PM -0500, Peter Xu wrote: > Agreed. I didn't mention uffd_wp check (which I actually mentioned in the > reply > to v1 patchset) here only because the uffd_wp check is pure optimization; > while Agreed it's a pure optimization. Only if we used the group lock to fi

Re: [RFC PATCH v2 1/2] mm/userfaultfd: fix memory corruption due to writeprotect

2021-01-05 Thread Andrea Arcangeli
On Mon, Jan 04, 2021 at 09:26:33PM +, Nadav Amit wrote: > I would feel more comfortable if you provide patches for uffd-wp. If you > want, I will do it, but I restate that I do not feel comfortable with this > solution (worried as it seems a bit ad-hoc and might leave out a scenario > we all mi

Re: [RFC PATCH v2 2/2] fs/task_mmu: acquire mmap_lock for write on soft-dirty cleanup

2021-01-05 Thread Andrea Arcangeli
On Fri, Dec 25, 2020 at 01:25:29AM -0800, Nadav Amit wrote: > Fixes: 0f8975ec4db2 ("mm: soft-dirty bits for user memory changes tracking") Targeting a backport down to 2013 when nothing could wrong in practice with page_mapcount sounds backwards and unnecessarily risky. In theory it was already b

Re: [RFC PATCH v2 1/2] mm/userfaultfd: fix memory corruption due to writeprotect

2021-01-05 Thread Andrea Arcangeli
On Tue, Jan 05, 2021 at 10:08:13AM -0500, Peter Xu wrote: > On Fri, Dec 25, 2020 at 01:25:28AM -0800, Nadav Amit wrote: > > diff --git a/mm/mprotect.c b/mm/mprotect.c > > index ab709023e9aa..c08c4055b051 100644 > > --- a/mm/mprotect.c > > +++ b/mm/mprotect.c > > @@ -75,7 +75,8 @@ static unsigned lo

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2021-01-05 Thread Andrea Arcangeli
On Tue, Jan 05, 2021 at 04:37:27PM +0100, Peter Zijlstra wrote: > (your other email clarified this point; the COW needs to copy while > holding the PTL and we need TLBI under PTL if we're to change this) The COW doesn't need to hold the PT lock, the TLBI broadcast doesn't need to be delivered unde

Re: [RFC PATCH v2 1/2] mm/userfaultfd: fix memory corruption due to writeprotect

2021-01-05 Thread Andrea Arcangeli
On Tue, Jan 05, 2021 at 09:58:57AM +0100, Peter Zijlstra wrote: > On Mon, Jan 04, 2021 at 02:24:38PM -0500, Andrea Arcangeli wrote: > > On Mon, Jan 04, 2021 at 01:22:27PM +0100, Peter Zijlstra wrote: > > > On Fri, Dec 25, 2020 at 01:25:28AM -0800, Nadav Amit wrote: > > >

Re: [RFC PATCH v2 1/2] mm/userfaultfd: fix memory corruption due to writeprotect

2021-01-04 Thread Andrea Arcangeli
On Mon, Jan 04, 2021 at 08:39:37PM +, Nadav Amit wrote: > > On Jan 4, 2021, at 12:19 PM, Andrea Arcangeli wrote: > > > > On Mon, Jan 04, 2021 at 07:35:06PM +, Nadav Amit wrote: > >>> On Jan 4, 2021, at 11:24 AM, Andrea Arcangeli wrote: > >>>

Re: [RFC PATCH v2 1/2] mm/userfaultfd: fix memory corruption due to writeprotect

2021-01-04 Thread Andrea Arcangeli
On Mon, Jan 04, 2021 at 07:35:06PM +, Nadav Amit wrote: > > On Jan 4, 2021, at 11:24 AM, Andrea Arcangeli wrote: > > > > Hello, > > > > On Mon, Jan 04, 2021 at 01:22:27PM +0100, Peter Zijlstra wrote: > >> On Fri, Dec 25, 2020 at 01:25:28AM -0800, Nad

Re: [RFC PATCH v2 1/2] mm/userfaultfd: fix memory corruption due to writeprotect

2021-01-04 Thread Andrea Arcangeli
Hello, On Mon, Jan 04, 2021 at 01:22:27PM +0100, Peter Zijlstra wrote: > On Fri, Dec 25, 2020 at 01:25:28AM -0800, Nadav Amit wrote: > > > The scenario that happens in selftests/vm/userfaultfd is as follows: > > > > cpu0cpu1cpu2 > >

Re: kernelci/staging-next bisection: sleep.login on rk3288-rock2-square #2286-staging

2021-01-03 Thread Andrea Arcangeli
Hello Mike, On Sun, Jan 03, 2021 at 03:47:53PM +0200, Mike Rapoport wrote: > Thanks for the logs, it seems that implicitly adding reserved regions to > memblock.memory wasn't that bright idea :) Would it be possible to somehow clean up the hack then? The only difference between the clean solutio

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-24 Thread Andrea Arcangeli
On Thu, Dec 24, 2020 at 01:49:45PM -0500, Andrea Arcangeli wrote: > Without the above, can't the CPU decrement the tlb_flush_pending while > the IPI to ack didn't arrive yet in csd_lock_wait? Ehm: csd_lock_wait has smp_acquire__after_ctrl_dep() so the write side looks ok after all sorry.

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-24 Thread Andrea Arcangeli
On Wed, Dec 23, 2020 at 09:18:09PM -0800, Nadav Amit wrote: > I am not trying to be argumentative, and I did not think through about an > alternative solution. It sounds to me that your proposed solution is correct > and would probably be eventually (slightly) more efficient than anything > that I

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-23 Thread Andrea Arcangeli
> On Wed, Dec 23, 2020 at 07:09:10PM -0800, Nadav Amit wrote: > > I think there are other cases in which Andy’s concern is relevant > > (MADV_PAGEOUT). I didn't try to figure how it would help MADV_PAGEOUT. MADV_PAGEOUT sounds cool feature, but maybe it'd need a way to flush the invalidates out an

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-23 Thread Andrea Arcangeli
On Wed, Dec 23, 2020 at 09:00:26PM -0500, Andrea Arcangeli wrote: > One other near zero cost improvement easy to add if this would be "if > (vma->vm_flags & (VM_SOFTDIRTY|VM_UFFD_WP))" and it could be made The next worry then is if UFFDIO_WRITEPROTECT is very large then th

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-23 Thread Andrea Arcangeli
On Wed, Dec 23, 2020 at 05:21:43PM -0800, Andy Lutomirski wrote: > I don’t love this as a long term fix. AFAICT we can have mm_tlb_flush_pending > set for quite a while — mprotect seems like it can wait in IO while splitting > a huge page, for example. That gives us a window in which every write

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-23 Thread Andrea Arcangeli
Hello Linus, On Wed, Dec 23, 2020 at 03:39:53PM -0800, Linus Torvalds wrote: > On Wed, Dec 23, 2020 at 1:39 PM Andrea Arcangeli wrote: > > > > On Tue, Dec 22, 2020 at 08:36:04PM -0700, Yu Zhao wrote: > > > Thanks for the details. > > > > I hope we can find a

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-23 Thread Andrea Arcangeli
On Wed, Dec 23, 2020 at 02:45:59PM -0800, Nadav Amit wrote: > I think it may be reasonable. Whatever solution used, there will be 2 users of it: uffd-wp will use whatever technique used by clear_refs_write to avoid the mmap_write_lock. My favorite is Yu's patch and not the group lock anymore. The

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-23 Thread Andrea Arcangeli
On Wed, Dec 23, 2020 at 03:29:51PM -0700, Yu Zhao wrote: > I was hesitant to suggest the following because it isn't that straight > forward. But since you seem to be less concerned with the complexity, > I'll just bring it on the table -- it would take care of both ufd and > clear_refs_write, would

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-23 Thread Andrea Arcangeli
On Tue, Dec 22, 2020 at 04:40:32AM -0800, Nadav Amit wrote: > > On Dec 21, 2020, at 1:24 PM, Yu Zhao wrote: > > > > On Mon, Dec 21, 2020 at 12:26:22PM -0800, Linus Torvalds wrote: > >> On Mon, Dec 21, 2020 at 12:23 PM Nadav Amit wrote: > >>> Using mmap_write_lock() was my initial fix and there w

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-23 Thread Andrea Arcangeli
On Tue, Dec 22, 2020 at 08:36:04PM -0700, Yu Zhao wrote: > Thanks for the details. I hope we can find a way put the page_mapcount back where there's a page_count right now. If you're so worried about having to maintain a all defined well documented (or to be documented even better if you ACK it)

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-23 Thread Andrea Arcangeli
On Wed, Dec 23, 2020 at 10:52:35AM -0500, Peter Xu wrote: > On Tue, Dec 22, 2020 at 08:36:04PM -0700, Yu Zhao wrote: > > In your patch, do we need to take wrprotect_rwsem in > > handle_userfault() as well? Otherwise, it seems userspace would have > > to synchronize between its wrprotect ioctl and f

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-23 Thread Andrea Arcangeli
On Wed, Dec 23, 2020 at 01:51:59PM -0500, Andrea Arcangeli wrote: > NOTE: about the above comment, that mprotect takes > mmap_read_lock. Your above code change in the commit above, still has write Correction to avoid any confusion.

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-23 Thread Andrea Arcangeli
On Wed, Dec 23, 2020 at 11:24:16AM -0500, Peter Xu wrote: > I think this is not against Linus's example - where cpu2 does not have tlb > cached so it sees RO while cpu3 does have tlb cached so cpu3 can still modify > it. So IMHO there's no problem here. > > But I do think in step 2 here we overlo

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-22 Thread Andrea Arcangeli
On Tue, Dec 22, 2020 at 05:23:39PM -0700, Yu Zhao wrote: > and 2) people are spearheading multiple efforts to reduce the mmap_lock > contention, which hopefully would make ufd users suffer less soon. In my view UFFD is an already deployed working solution that eliminates the mmap_lock_write conten

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-22 Thread Andrea Arcangeli
On Tue, Dec 22, 2020 at 04:39:46PM -0700, Yu Zhao wrote: > We are talking about non-COW anon pages here -- they can't be mapped > more than once. So why not just identify them by checking > page_mapcount == 1 and then unconditionally reuse them? (This is > probably where I've missed things.) The p

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-22 Thread Andrea Arcangeli
On Tue, Dec 22, 2020 at 02:14:41PM -0700, Yu Zhao wrote: > This works but I don't prefer this option because 1) this is new > way of making pte_wrprotect safe and 2) it's specific to ufd and > can't be applied to clear_soft_dirty() which has no "catcher". No I didn't look into clear_soft_dirty iss

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-22 Thread Andrea Arcangeli
On Tue, Dec 22, 2020 at 12:58:18PM -0800, Nadav Amit wrote: > I had somewhat similar ideas - saving in each page-struct the generation, > which would allow to: (1) extend pte_same() to detect interim changes > that were reverted (RO->RW->RO) and (2) per-PTE pending flushes. What don't you feel saf

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-22 Thread Andrea Arcangeli
On Tue, Dec 22, 2020 at 12:19:49PM -0800, Nadav Amit wrote: > Perhaps any change to PTE in a page-table should increase a page-table > generation that we would save in the page-table page-struct (next to the The current rule is that by the time in the page fault we find a pte/hugepmd in certain st

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-22 Thread Andrea Arcangeli
On Tue, Dec 22, 2020 at 08:15:53PM +, Matthew Wilcox wrote: > On Tue, Dec 22, 2020 at 02:31:52PM -0500, Andrea Arcangeli wrote: > > My previous suggestion to use a mutex to serialize > > userfaultfd_writeprotect with a mutex will still work, but we can run > > as

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-22 Thread Andrea Arcangeli
On Mon, Dec 21, 2020 at 02:55:12PM -0800, Nadav Amit wrote: > wouldn’t mmap_write_downgrade() be executed before mprotect_fixup() (so I assume you mean "in" mprotect_fixup, after change_protection. If you would downgrade the mmap_lock to read there, then it'd severely slowdown the non contention

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-22 Thread Andrea Arcangeli
be more intrusive in the VM and it's overall unnecessary. The below is mostly untested... but it'd be good to hear some feedback before doing more work in this direction. >From 4ace4d1b53f5cb3b22a5c2dc33facc4150b112d6 Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli Date: Tue, 22 Dec

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-21 Thread Andrea Arcangeli
Hello, On Sat, Dec 19, 2020 at 09:08:55PM -0800, Andy Lutomirski wrote: > On Sat, Dec 19, 2020 at 6:49 PM Andrea Arcangeli wrote: > > The ptes are changed always with the PT lock, in fact there's no > > problem with the PTE updates. The only difference with mprotect >

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-19 Thread Andrea Arcangeli
On Sat, Dec 19, 2020 at 06:01:39PM -0800, Andy Lutomirski wrote: > I missed the beginning of this thread, but it looks to me like > userfaultfd changes PTEs with not locking except mmap_read_lock(). It There's no mmap_read_lock, I assume you mean mmap_lock for reading. The ptes are changed alway

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-19 Thread Andrea Arcangeli
On Sat, Dec 19, 2020 at 02:06:02PM -0800, Nadav Amit wrote: > > On Dec 19, 2020, at 1:34 PM, Nadav Amit wrote: > > > > [ cc’ing some more people who have experience with similar problems ] > > > >> On Dec 19, 2020, at 11:15 AM, Andrea Arcangeli wrote: > &g

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-19 Thread Andrea Arcangeli
Hello, On Fri, Dec 18, 2020 at 08:30:06PM -0800, Nadav Amit wrote: > Analyzing this problem indicates that there is a real bug since > mmap_lock is only taken for read in mwriteprotect_range(). This might Never having to take the mmap_sem for writing, and in turn never blocking, in order to modif

Re: [PATCH v2 1/2] mm: memblock: enforce overlap of memory.memblock and memory.reserved

2020-12-14 Thread Andrea Arcangeli
; >>> memblock.memory > >>> before calculating node and zone boundaries. > >>> > >>> Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions > >>> rather that check each PFN") > >>> Reported-by: Andrea Arcangel

Re: [PATCH v2 2/2] mm: fix initialization of struct page for holes in memory layout

2020-12-09 Thread Andrea Arcangeli
Hello, On Wed, Dec 09, 2020 at 11:43:04PM +0200, Mike Rapoport wrote: > +void __init __weak memmap_init(unsigned long size, int nid, > +unsigned long zone, > +unsigned long range_start_pfn) > +{ > + unsigned long start_pfn, end_pfn, hole_

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-12-05 Thread Andrea Arcangeli
Hi Mel, On Thu, Nov 26, 2020 at 10:47:20AM +, Mel Gorman wrote: > Agreed. This thread has a lot of different directions in it at this > point so what I'd hope for is first, a patch that initialises holes with > zone/node linkages within a 1<<(MAX_ORDER-1) alignment. If there is a > hole, it wo

[PATCH 0/1] mm: initialize struct pages in reserved regions outside of the zone ranges

2020-12-04 Thread Andrea Arcangeli
at is not part of the zone in order to boot with the very fix that was meant to prevent such invariant to be broken in the first place. I don't think pfn 0 deserves a magic exception and a pass to break the invariant (even ignoring it may happen that the first pfn in nid > 0 might then also g

[PATCH 1/1] mm: initialize struct pages in reserved regions outside of the zone ranges

2020-12-04 Thread Andrea Arcangeli
ed to be extended to include all memblock.reserved ranges with struct pages too or they'll be left uninitialized with PagePoison as it happened to pfn 0. Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") Signed-off-by: Andrea Arcangeli

Re: [PATCH v2] mm: Don't fault around userfaultfd-registered regions on reads

2020-12-04 Thread Andrea Arcangeli
On Fri, Dec 04, 2020 at 02:23:29PM -0500, Peter Xu wrote: > If we see [1]: > > if (!pte_present && !pte_none && pte_swp_uffd_wp && not_anonymous_vma && > !is_migration_entry) > > Then it's fundamentally the same as: > > swp_entry(0, _UFFD_SWP_UFFD_WP) && !vma_is_anonymous(vma) Yes conceptu

Re: [PATCH v2] mm: Don't fault around userfaultfd-registered regions on reads

2020-12-04 Thread Andrea Arcangeli
On Thu, Dec 03, 2020 at 11:10:18PM -0500, Andrea Arcangeli wrote: > from the pte, one that cannot ever be set in any swp entry today. I > assume it can't be _PAGE_SWP_UFFD_WP since that already can be set but > you may want to verify it... I thought more about the above, and I thi

Re: [PATCH v2] mm: Don't fault around userfaultfd-registered regions on reads

2020-12-03 Thread Andrea Arcangeli
Hi Peter, On Thu, Dec 03, 2020 at 09:30:51PM -0500, Peter Xu wrote: > I'm just afraid there's no space left for a migration entry, because migration > entries fills in the pfn information into swp offset field rather than a real > offset (please refer to make_migration_entry())? I assume PFN can

Re: [PATCH v2] mm: Don't fault around userfaultfd-registered regions on reads

2020-12-03 Thread Andrea Arcangeli
On Thu, Dec 03, 2020 at 01:02:34PM -0500, Peter Xu wrote: > On Wed, Dec 02, 2020 at 09:36:45PM -0800, Hugh Dickins wrote: > > On Wed, 2 Dec 2020, Peter Xu wrote: > > > On Wed, Dec 02, 2020 at 02:37:33PM -0800, Hugh Dickins wrote: > > > > On Tue, 1 De

Re: [PATCH] mm: refactor initialization of stuct page for holes in memory layout

2020-12-03 Thread Andrea Arcangeli
Hello, On Thu, Dec 03, 2020 at 08:25:49AM +0200, Mike Rapoport wrote: > On Wed, Dec 02, 2020 at 03:47:36PM -0800, Andrew Morton wrote: > > On Tue, 1 Dec 2020 20:15:02 +0200 Mike Rapoport wrote: > > > > > From: Mike Rapoport > > > > > > There could be struct pages that are not backed by actual

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-12-03 Thread Andrea Arcangeli
On Thu, Dec 03, 2020 at 12:51:07PM +0200, Mike Rapoport wrote: > On Thu, Dec 03, 2020 at 01:23:02AM -0500, Andrea Arcangeli wrote: > > 5) pfn 0 is the classical case where pfn 0 is in a reserved zone in > >memblock.reserve that doesn't overlap any memblock.memory zone. >

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-12-02 Thread Andrea Arcangeli
zone_end_pfn() 0contiguous 0 Movable zone_start_pfn 0zone_end_pfn() 0contiguous 0 500246 0x7a216000 0xfff1000 reserved True 500247 0x7a217000 0x1fff reserved False 500248 0x7a218000 0x1fff00010200 reserved False ] quote from previou

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-12-01 Thread Andrea Arcangeli
Hello Mike, On Sun, Nov 29, 2020 at 02:32:57PM +0200, Mike Rapoport wrote: > Hello Andrea, > > On Thu, Nov 26, 2020 at 07:46:05PM +0200, Mike Rapoport wrote: > > On Thu, Nov 26, 2020 at 11:05:14AM +0100, David Hildenbrand wrote: > > > > Let's try to merge init_unavailable_memory() into memmap_in

Re: [PATCH v2] mm: Don't fault around userfaultfd-registered regions on reads

2020-12-01 Thread Andrea Arcangeli
On Tue, Dec 01, 2020 at 05:30:33PM -0500, Peter Xu wrote: > On Tue, Dec 01, 2020 at 12:59:27PM +, Matthew Wilcox wrote: > > On Mon, Nov 30, 2020 at 06:06:03PM -0500, Peter Xu wrote: > > > Faulting around for reads are in most cases helpful for the performance > > > so that > > > continuous mem

Re: [PATCH] mm: Don't fault around userfaultfd-registered regions on reads

2020-12-01 Thread Andrea Arcangeli
Hi Hugh, On Tue, Dec 01, 2020 at 01:31:21PM -0800, Hugh Dickins wrote: > Please don't ever rely on that i_private business for correctness: as The out of order and lockless "if (inode->i_private)" branch didn't inspire much confidence in terms of being able to rely on it for locking correctness i

Re: [PATCH] mm: Don't fault around userfaultfd-registered regions on reads

2020-12-01 Thread Andrea Arcangeli
Hi Peter, On Sat, Nov 28, 2020 at 10:29:03AM -0500, Peter Xu wrote: > Yes. A trivial supplementary detail is that filemap_map_pages() will only set > it read-only since alloc_set_pte() will only set write bit if it's a write. > In > our case it's read fault-around so without it. However it'll

Re: [PATCH] mm: Don't fault around userfaultfd-registered regions on reads

2020-11-27 Thread Andrea Arcangeli
stop faulting in pages without checking inode->i_private Per shmem_fault comment shmem need to "refrain from faulting pages into the hole while it's being punched" and to do so it must check inode->i_private, which filemap_map_pages won't so it's unsafe to use in shmem becau

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-11-26 Thread Andrea Arcangeli
On Thu, Nov 26, 2020 at 09:44:26PM +0200, Mike Rapoport wrote: > TBH, the whole interaction between e820 and memblock keeps me puzzled > and I can only make educated guesses why some ranges here are > memblock_reserve()'d and some memblock_add()ed. The mixed usage in that interaction between membl

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-11-26 Thread Andrea Arcangeli
On Thu, Nov 26, 2020 at 11:36:02AM +0200, Mike Rapoport wrote: > I think it's inveneted by your BIOS vendor :) BTW, all systems I use on a daily basis have that type 20... Only two of them are reproducing the VM_BUG_ON on a weekly basis on v5.9. If you search 'E820 "type 20"' you'll get plenty of

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-11-26 Thread Andrea Arcangeli
On Thu, Nov 26, 2020 at 11:36:02AM +0200, Mike Rapoport wrote: > memory.reserved cannot be calculated automatically. It represents all > the memory allocations made before page allocator is up. And as > memblock_reserve() is the most basic to allocate memory early at boot we > cannot really delete

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-11-26 Thread Andrea Arcangeli
On Thu, Nov 26, 2020 at 11:05:14AM +0100, David Hildenbrand wrote: > I agree that this is sub-optimal, as such pages are impossible to detect > (PageReserved is just not clear as discussed with Andrea). The basic > question is how we want to proceed: > > a) Make sure any online struct page has a v

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-11-25 Thread Andrea Arcangeli
On Wed, Nov 25, 2020 at 12:34:41AM -0500, Andrea Arcangeli wrote: > pfnphysaddr page->flags > 500224 0x7a20 0x1fff1000 reserved True > 500225 0x7a201000 0x1fff1000 reserved True > *snip* > 500245 0x7a215000 0x1fff1000 reserved True &

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-11-25 Thread Andrea Arcangeli
On Wed, Nov 25, 2020 at 11:04:14PM +0200, Mike Rapoport wrote: > I think the very root cause is how e820__memblock_setup() registers > memory with memblock: > > if (entry->type == E820_TYPE_SOFT_RESERVED) > memblock_reserve(entry->addr, entry->size); > >

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-11-25 Thread Andrea Arcangeli
On Wed, Nov 25, 2020 at 08:27:21PM +0100, David Hildenbrand wrote: > On 25.11.20 19:28, Andrea Arcangeli wrote: > > On Wed, Nov 25, 2020 at 07:45:30AM +0100, David Hildenbrand wrote: > >> Before that change, the memmap of memory holes were only zeroed > >> out. So t

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-11-25 Thread Andrea Arcangeli
On Wed, Nov 25, 2020 at 04:13:25PM +0200, Mike Rapoport wrote: > I suspect that memmap for the reserved pages is not properly initialized > after recent changes in free_area_init(). They are cleared at > init_unavailable_mem() to have zone=0 and node=0, but they seem to be I'd really like if we wo

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-11-25 Thread Andrea Arcangeli
On Wed, Nov 25, 2020 at 01:08:54PM +0100, Vlastimil Babka wrote: > Yeah I guess it would be simpler if zoneid/nid was correct for > pfn_valid() pfns within a zone's range, even if they are reserved due > not not being really usable memory. > > I don't think we want to introduce CONFIG_HOLES_IN_Z

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-11-25 Thread Andrea Arcangeli
On Wed, Nov 25, 2020 at 12:41:55PM +0100, David Hildenbrand wrote: > On 25.11.20 12:04, David Hildenbrand wrote: > > On 25.11.20 11:39, Mel Gorman wrote: > >> On Wed, Nov 25, 2020 at 07:45:30AM +0100, David Hildenbrand wrote: > Something must have changed more recently than v5.1 that caused th

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-11-25 Thread Andrea Arcangeli
On Wed, Nov 25, 2020 at 07:45:30AM +0100, David Hildenbrand wrote: > Before that change, the memmap of memory holes were only zeroed > out. So the zones/nid was 0, however, pages were not reserved and > had a refcount of zero - resulting in other issues. So maybe that "0,0" zoneid/nid was not actu

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-11-25 Thread Andrea Arcangeli
On Wed, Nov 25, 2020 at 10:30:53AM +, Mel Gorman wrote: > On Tue, Nov 24, 2020 at 03:56:22PM -0500, Andrea Arcangeli wrote: > > Hello, > > > > On Tue, Nov 24, 2020 at 01:32:05PM +, Mel Gorman wrote: > > > I would hope that is not the case because

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-11-24 Thread Andrea Arcangeli
Hello, On Mon, Nov 23, 2020 at 02:01:16PM +0100, Vlastimil Babka wrote: > On 11/21/20 8:45 PM, Andrea Arcangeli wrote: > > A corollary issue was fixed in > > 39639000-39814fff : Unknown E820 type > > > > pfn 0x7a200 -> 0x7a20 min_pfn hit non-RAM: > >

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-11-24 Thread Andrea Arcangeli
262671e88723b3074251189004ceae39dcd1689d Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli Date: Sat, 21 Nov 2020 12:55:58 -0500 Subject: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages A corollary issue was fixed in e577c8b64d58fe307ea4d

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-11-21 Thread Andrea Arcangeli
On Sat, Nov 21, 2020 at 02:45:06PM -0500, Andrea Arcangeli wrote: > + if (likely(!PageReserved(page))) NOTE: this line will have to become "likely(page && !PageReserved(page))" to handle the case of non contiguous zones, since pageblock_pfn

[PATCH 0/1] VM_BUG_ON_PAGE(!zone_spans_pfn) in set_pfnblock_flags_mask

2020-11-21 Thread Andrea Arcangeli
gt; PAGE_SHIFT; I didn't try to inject the bug to validate the fix and it'd be great if someone can try that to validate this or any other fix. Andrea Arcangeli (1): mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages mm/compaction.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-)

[PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-11-21 Thread Andrea Arcangeli
e PageBuddy check, except in the new fast_isolate_around() path). Fixes: 5a811889de10 ("mm, compaction: use free lists to quickly locate a migration target") Signed-off-by: Andrea Arcangeli --- mm/compaction.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/com

  1   2   3   4   5   6   7   8   9   10   >