Re: kvm-85 sometimes not starting on 2.6.30-rc5

2009-05-23 Thread Andrea Arcangeli
On Sun, May 17, 2009 at 11:27:42PM +0300, Avi Kivity wrote: > Andrea, looks like the mother of all locks below. eheh yes that really is the mother of all locks ;). So the thing is, like BUG says MAX_LOCK_DEPTH is too low, to fix you should rebuild after increasing it in include/linux/sched.h to s

Re: KVM: protect concurrent make_all_cpus_request

2009-06-17 Thread Andrea Arcangeli
On Wed, Jun 17, 2009 at 10:53:47AM -0300, Marcelo Tosatti wrote: > > > make_all_cpus_request contains a race condition which can > trigger false request completed status, as follows: > > CPU0 CPU1 > > if (test_and_set_bit(req,&vcpu->requests)) >

enable sysenter on 32bit guests

2009-06-18 Thread Andrea Arcangeli
From: Andrea Arcangeli model=2 is not existent when vendor is intel and an errata of P6 says that any model <= 2 when family is 6 lack sap feature, so windows and linux 32bit guests disable sap in software and slowdown for no good reason when running inside kvm on intel CPU. Fix is to set mo

Re: [Qemu-devel] KVMs default CPU type (was: allow sysenter on 32bit guests running on vmx host)

2009-06-25 Thread Andrea Arcangeli
Hi everyone, On Thu, Jun 25, 2009 at 10:11:58AM +0200, Andre Przywara wrote: > common denominator. This should be a family 15 CPU (AMD K8 or Intel P4) > with a constant vendor ID (in my experiments Intel showed less problems > with guests). Since 64bit Windows has a whitelist of known vendor IDs

Re: [Qemu-devel] KVMs default CPU type (was: allow sysenter on 32bit guests running on vmx host)

2009-06-25 Thread Andrea Arcangeli
On Fri, Jun 26, 2009 at 02:42:17AM +0200, Andrea Arcangeli wrote: > that purely asks for troubles I think. At the same time I doubt we > want to deviate much from qemu code here, this seems messy enough > already without big changes to maintain in this area, and I guess this > explain

Re: mmu_notifiers: turn off lockdep around mm_take_all_locks

2009-07-07 Thread Andrea Arcangeli
On Tue, Jul 07, 2009 at 10:00:25PM +0200, Peter Zijlstra wrote: > It does feel slightly weird to explicitly overflow that preempt count > though. That is actually fixable there without adding more bits to preempt count, just call a global preempt_disable after lockdep_off and call a spinlock versi

Re: [PATCH] kvm: Drop obsolete cpu_get/put in make_all_cpus_request

2009-07-21 Thread Andrea Arcangeli
Hi, I suggested this too first time around when I've seen the patch but they reminded it's needed to make life easier to preempt-rt... On Mon, Jul 20, 2009 at 11:30:12AM +0200, Jan Kiszka wrote: > spin_lock disables preemption, so we can simply read the current cpu. > > Signed-off-by: Jan Kiszka

Re: [PATCH] mmu_notifier, kvm: Introduce dirty bit tracking in spte and mmu notifier to help KSM dirty bit tracking

2011-06-22 Thread Andrea Arcangeli
On Tue, Jun 21, 2011 at 09:32:39PM +0800, Nai Xia wrote: > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index d48ec60..b407a69 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -4674,6 +4674,7 @@ static int __init vmx_init(void) > kvm_mmu_set_mask_ptes(0ull,

Re: [PATCH] mmu_notifier, kvm: Introduce dirty bit tracking in spte and mmu notifier to help KSM dirty bit tracking

2011-06-22 Thread Andrea Arcangeli
On Wed, Jun 22, 2011 at 11:39:40AM -0400, Rik van Riel wrote: > On 06/22/2011 07:19 AM, Izik Eidus wrote: > > > So what we say here is: it is better to have little junk in the unstable > > tree that get flushed eventualy anyway, instead of make the guest > > slower > > this race is something t

Re: [PATCH] mmu_notifier, kvm: Introduce dirty bit tracking in spte and mmu notifier to help KSM dirty bit tracking

2011-06-22 Thread Andrea Arcangeli
On Thu, Jun 23, 2011 at 07:13:54AM +0800, Nai Xia wrote: > I agree on this point. Dirty bit , young bit, is by no means accurate. Even > on 4kB pages, there is always a chance that the pte are dirty but the contents > are actually the same. Yeah, the whole optimization contains trade-offs and Just

Re: [PATCH] mmu_notifier, kvm: Introduce dirty bit tracking in spte and mmu notifier to help KSM dirty bit tracking

2011-06-22 Thread Andrea Arcangeli
On Thu, Jun 23, 2011 at 07:19:06AM +0800, Nai Xia wrote: > OK, I'll have a try over other workarounds. > I am not feeling good about need_pte_unmap myself. :-) The usual way is to check VM_HUGETLB in the caller and to call another function that doesn't kmap. Casting pmd_t to pte_t isn't really nic

Re: [PATCH] mmu_notifier, kvm: Introduce dirty bit tracking in spte and mmu notifier to help KSM dirty bit tracking

2011-06-22 Thread Andrea Arcangeli
On Thu, Jun 23, 2011 at 07:37:47AM +0800, Nai Xia wrote: > On 2MB pages, I'd like to remind you and Rik that ksmd currently splits > huge pages before their sub pages gets really merged to stable tree. > So when there are many 2MB pages each having a 4kB subpage > changed for all time, this is alre

Re: [PATCH] mmu_notifier, kvm: Introduce dirty bit tracking in spte and mmu notifier to help KSM dirty bit tracking

2011-06-22 Thread Andrea Arcangeli
On Thu, Jun 23, 2011 at 08:31:56AM +0800, Nai Xia wrote: > On Thu, Jun 23, 2011 at 7:59 AM, Andrea Arcangeli wrote: > > On Thu, Jun 23, 2011 at 07:37:47AM +0800, Nai Xia wrote: > >> On 2MB pages, I'd like to remind you and Rik that ksmd currently splits > >> huge p

Re: [RFC] postcopy livemigration proposal

2011-08-11 Thread Andrea Arcangeli
Hello everyone, so basically this is a tradeoff between not having a long latency for the migration to succeed and reducing the total network traffic (and CPU load) in the migration source and destination and reducing the memory footprint a bit, by adding an initial latency to the memory accesses

Re: [PATCH] mm: mmu_notifier: fix inconsistent memory between secondary MMU and host

2012-08-21 Thread Andrea Arcangeli
pte/spte established by set_pte_at_notify/change_pte is readonly we don't need to do the ptep_clear_flush_notify instead because when the host will write to the page that will fault and serialize against the PT lock (set_pte_at_notify must always run under the PT lock of course).

Re: [PATCH] mm: mmu_notifier: fix inconsistent memory between secondary MMU and host

2012-08-22 Thread Andrea Arcangeli
On Wed, Aug 22, 2012 at 02:03:41PM +0800, Xiao Guangrong wrote: > On 08/21/2012 11:06 PM, Andrea Arcangeli wrote: > > CPU0CPU1 > > oldpage[1] == 0 (both guest & host) > > oldpage[0] = 1 > > trigg

Re: [PATCH] mm: mmu_notifier: fix inconsistent memory between secondary MMU and host

2012-08-22 Thread Andrea Arcangeli
On Wed, Aug 22, 2012 at 11:51:17AM +0800, Xiao Guangrong wrote: > Hmm, in KSM code, i found this code in replace_page: > > set_pte_at_notify(mm, addr, ptep, mk_pte(kpage, vma->vm_page_prot)); > > It is possible to establish a writable pte, no? Hugh already answered this thanks. Further details o

Re: [PATCH] mm: mmu_notifier: fix inconsistent memory between secondary MMU and host

2012-08-22 Thread Andrea Arcangeli
Hi Andrew, On Wed, Aug 22, 2012 at 12:15:35PM -0700, Andrew Morton wrote: > On Wed, 22 Aug 2012 18:29:55 +0200 > Andrea Arcangeli wrote: > > > On Wed, Aug 22, 2012 at 02:03:41PM +0800, Xiao Guangrong wrote: > > > On 08/21/2012 11:06 PM, Andrea Arcan

Re: [PATCH] mm: mmu_notifier: fix inconsistent memory between secondary MMU and host

2012-08-22 Thread Andrea Arcangeli
On Wed, Aug 22, 2012 at 12:58:05PM -0700, Andrew Morton wrote: > If you can suggest some text I'll type it in right now. Ok ;), I tried below: This is safe to start by updating the secondary MMUs, because the relevant primary MMU pte invalidate must have already happened with a ptep_clear_flush b

Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding

2011-11-23 Thread Andrea Arcangeli
Hi! On Mon, Nov 21, 2011 at 07:51:21PM -0600, Anthony Liguori wrote: > Fundamentally, the entity that should be deciding what memory should be > present > and where it should located is the kernel. I'm fundamentally opposed to > trying > to make QEMU override the scheduler/mm by using cpu or

Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding

2011-11-23 Thread Andrea Arcangeli
On Wed, Nov 23, 2011 at 07:34:37PM +0100, Alexander Graf wrote: > So if you define "-numa node,mem=1G,cpus=0" then QEMU should be able to > tell the kernel that this GB of RAM actually is close to that vCPU thread. > Of course the admin still needs to decide how to split up memory. That's > the d

Re: 2.6.38.1 general protection fault

2011-03-28 Thread Andrea Arcangeli
Hello everyone, On Mon, Mar 28, 2011 at 11:19:51AM +0200, Avi Kivity wrote: > On 03/28/2011 08:24 AM, Tomasz Chmielewski wrote: > > On 27.03.2011 11:42, Avi Kivity wrote: > > > > (...) > > > >> Okay, the fork came from the ,script=. > >> > >> The issue with %rsi looks like a use-after-free, howeve

Re: 2.6.38.1 general protection fault

2011-03-28 Thread Andrea Arcangeli
On Mon, Mar 28, 2011 at 08:02:47PM +0200, Avi Kivity wrote: > On 03/28/2011 07:54 PM, Andrea Arcangeli wrote: > > BTW, is it genuine that a protection fault is generated instead of a page > > fault while dereferencing address 0x8805d6b087f8? I would normally > > excep

Re: [ANNOUNCE] Native Linux KVM tool

2011-04-08 Thread Andrea Arcangeli
Hi Anthony, On Fri, Apr 08, 2011 at 09:00:43AM -0500, Anthony Liguori wrote: > An example is ioport_ops. This maps directly to > ioport_{read,write}_table in QEMU. Then you use ioport__register() to > register entries in this table similar register_ioport_{read,write}() in > QEMU. > > The us

Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding

2011-11-30 Thread Andrea Arcangeli
On Wed, Nov 30, 2011 at 09:52:37PM +0530, Dipankar Sarma wrote: > create the guest topology correctly and optimize for NUMA. This > would work for us. Even on the case of 1 guest that fits in one node, you're not going to max out the full bandwidth of all memory channels with this. qemu all can d

Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding

2011-12-01 Thread Andrea Arcangeli
On Thu, Dec 01, 2011 at 10:55:20PM +0530, Dipankar Sarma wrote: > On Wed, Nov 30, 2011 at 06:41:13PM +0100, Andrea Arcangeli wrote: > > On Wed, Nov 30, 2011 at 09:52:37PM +0530, Dipankar Sarma wrote: > > > create the guest topology correctly and optimize for NUMA. This >

Re: [PATCH 0/2][RFC] postcopy migration: Linux char device for postcopy

2012-01-02 Thread Andrea Arcangeli
On Thu, Dec 29, 2011 at 06:01:45PM +0200, Avi Kivity wrote: > On 12/29/2011 06:00 PM, Avi Kivity wrote: > > The NFS client has exactly the same issue, if you mount it with the intr > > option. In fact you could use the NFS client as a trivial umem/cuse > > prototype. > > Actually, NFS can return

Re: [PATCH 0/2][RFC] postcopy migration: Linux char device for postcopy

2012-01-03 Thread Andrea Arcangeli
On Mon, Jan 02, 2012 at 06:55:18PM +0100, Paolo Bonzini wrote: > On 01/02/2012 06:05 PM, Andrea Arcangeli wrote: > > On Thu, Dec 29, 2011 at 06:01:45PM +0200, Avi Kivity wrote: > >> On 12/29/2011 06:00 PM, Avi Kivity wrote: > >>> The NFS client has exactly the same i

Re: [PATCH 0/2][RFC] postcopy migration: Linux char device for postcopy

2012-01-12 Thread Andrea Arcangeli
On Thu, Jan 12, 2012 at 03:57:47PM +0200, Avi Kivity wrote: > On 01/03/2012 04:25 PM, Andrea Arcangeli wrote: > > > > > > So the problem is if we do it in > > > > userland with the current functionality you'll run out of VMAs and > > > > slowdown

Re: [PATCH 0/2][RFC] postcopy migration: Linux char device for postcopy

2012-01-12 Thread Andrea Arcangeli
On Thu, Jan 12, 2012 at 03:59:59PM +0200, Avi Kivity wrote: > On 01/04/2012 05:03 AM, Isaku Yamahata wrote: > > Yes, it's quite doable in user space(qemu) with a kernel-enhancement. > > And it would be easy to convert a separated daemon process into a thread > > in qemu. > > > > I think it should b

Re: [PATCH] KVM: MMU: fix huge page adapted on non-PAE host

2012-05-28 Thread Andrea Arcangeli
Hi, On Mon, May 28, 2012 at 04:53:38PM +0300, Avi Kivity wrote: > As far as I can tell __get_user_pages_fast() will take the reference > count in the page head in the first place. mask = KVM_PAGES_PER_HPAGE(level) - 1; The BUG would trigger if the above KVM mask is 2M (that is the NPT/EPT pm

Re: [PATCH] KVM: MMU: fix huge page adapted on non-PAE host

2012-05-28 Thread Andrea Arcangeli
On Mon, May 28, 2012 at 05:20:02PM +0300, Avi Kivity wrote: > The "right thing" we should be doing is running get_page() on every > small page within the frame (we asked for a small page but are > opportunistrically using the pages around it, without a proper ref). > That's a bit slow though, so we

Re: [PATCH] KVM: MMU: fix huge page adapted on non-PAE host

2012-05-28 Thread Andrea Arcangeli
On Mon, May 28, 2012 at 05:40:08PM +0300, Avi Kivity wrote: > Yes, I see it now. Adjusting mask is incorrect since we won't have the > same adjustment on release. I'll apply the patch for 3.5. Sounds great to me. One thing I'm not sure about is about the real need of the mmio check vs a stright

Re: [PATCH v8 13/15] ARM: KVM: Handle guest faults in KVM

2012-06-19 Thread Andrea Arcangeli
On Tue, Jun 19, 2012 at 12:32:06PM +0300, Avi Kivity wrote: > On 06/19/2012 01:20 AM, Christoffer Dall wrote: > > On Mon, Jun 18, 2012 at 9:45 AM, Avi Kivity wrote: > >> On 06/15/2012 10:09 PM, Christoffer Dall wrote: > >>> From: Christoffer Dall > >>> > >>> Handles the guest faults in KVM by map

Re: [PATCH v8 13/15] ARM: KVM: Handle guest faults in KVM

2012-06-20 Thread Andrea Arcangeli
On Wed, Jun 20, 2012 at 11:13:36AM -0400, Christoffer Dall wrote: > ah, we don't do things right, we use gfn_to_pfn() flat out and will > always break the COW :) > > I guess now, when change_pte is a nop, it's outright incorrect if > anyone runs KSM. > > This has just been added to my todo-list.

Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

2012-02-14 Thread Andrea Arcangeli
On Fri, Feb 10, 2012 at 03:28:31PM +0900, Takuya Yoshikawa wrote: > Other threads may process the same page in that small window and skip > TLB flush and then return before these functions do flush. It's correct to flush the shadow MMU TLB without the mmu_lock only in the context of mmu notifier m

Re: [PATCH 2/2] KVM: MMU: Flush TLBs only once in invlpg() before releasing mmu_lock

2012-02-14 Thread Andrea Arcangeli
On Tue, Feb 14, 2012 at 01:56:17PM +0900, Takuya Yoshikawa wrote: > (2012/02/14 13:36), Takuya Yoshikawa wrote: > > > BTW, do you think that "kvm_mmu_flush_tlb()" should be moved inside of the > > mmu_lock critical section? > > > > Ah, forget about this. Trivially no. Yes the reason is that it'

Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

2012-02-14 Thread Andrea Arcangeli
On Fri, Feb 10, 2012 at 03:52:49PM +0800, Xiao Guangrong wrote: > On 02/10/2012 02:28 PM, Takuya Yoshikawa wrote: > > > Other threads may process the same page in that small window and skip > > TLB flush and then return before these functions do flush. > > > > > It is possible that flush tlb in

Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

2012-02-14 Thread Andrea Arcangeli
On Tue, Feb 14, 2012 at 03:29:47PM -0200, Marcelo Tosatti wrote: > The problem the patch is fixing is not related to page freeing, but > rmap_write_protect. From 8bf3f6f06fcdfd097b6c6ec51531d8292fa0d81d Can't find the commit on kvm.git. > (replace "A (get_dirty_log)" with "mmu_notifier_invalidate

Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

2012-02-15 Thread Andrea Arcangeli
On Wed, Feb 15, 2012 at 04:07:49PM +0200, Avi Kivity wrote: > Well, it still has flushes inside the lock. And it seems to be more > complicated, but maybe that's because I thought of my idea and didn't > fully grok yours yet. If we go more complicated I prefer Avi's suggestion to move them all ou

Re: qemu-kvm defunct due to THP [was: mmotm 2011-01-06-15-41 uploaded]

2011-01-10 Thread Andrea Arcangeli
ed. Thanks a lot, Andrea Subject: thp: fix for KVM THP support From: Andrea Arcangeli There were several bugs: dirty_bitmap ignored (migration shutoff largepages), has_wrprotect_page(directory_level) ignored, refcount taken on tail page and refcount released on pfn head page post-adjustment (n

Re: qemu-kvm defunct due to THP [was: mmotm 2011-01-06-15-41 uploaded]

2011-01-12 Thread Andrea Arcangeli
On Mon, Jan 10, 2011 at 10:02:50PM +0100, Jiri Slaby wrote: > Yup, this works for me. If you point me to the other 2, I will test them > too... Sure, and they're already included in -mm. http://marc.info/?l=linux-mm&m=129442647907831&q=raw http://marc.info/?l=linux-mm&m=129442718808733&q=raw http

Re: [ANNOUNCE] Native Linux KVM tool

2011-04-11 Thread Andrea Arcangeli
On Sat, Apr 09, 2011 at 09:40:09AM +0200, Ingo Molnar wrote: > > * Andrea Arcangeli wrote: > > > [...] I thought the whole point of a native kvm tool was to go all the > > paravirt way to provide max performance and maybe also depend on vhost as > > much as possibl

Re: KVM induced panic on 2.6.38[2367] & 2.6.39

2011-05-31 Thread Andrea Arcangeli
Hello, On Wed, Jun 01, 2011 at 08:37:25AM +0800, Brad Campbell wrote: > On 01/06/11 06:31, Hugh Dickins wrote: > > Brad, my suspicion is that in each case the top 16 bits of RDX have been > > mysteriously corrupted from to , causing the general protection > > faults. I don't understand w

Re: [PATCH 1/3] virt: Add Transparent Hugepages setup v2

2011-06-16 Thread Andrea Arcangeli
Hi Lucas, On Wed, Jun 15, 2011 at 09:18:34PM -0300, Lucas Meneghel Rodrigues wrote: > +class THPNotSupportedError(THPError): > +""" > +Thrown when host does not support tansparent hugepages. > +""" > +pass s/tansparent/transparent/ > +class THPWriteConfigError(THPError): > +"

Re: [PATCH 1/3] virt: Add Transparent Hugepages setup v2

2011-06-17 Thread Andrea Arcangeli
On Thu, Jun 16, 2011 at 01:34:54PM -0300, Lucas Meneghel Rodrigues wrote: > On Thu, 2011-06-16 at 17:56 +0200, Andrea Arcangeli wrote: > > Hi Lucas, > > Hi Andrea, thanks for the review! Yiqiao is working on the patchset, v3 > or v4 will contain the fixes you have pointed out.

Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Andrea Arcangeli
ifferent mechanism for VM_MIXEDMAP that does the refcounting and doesn't require in turn the driver to mark the page PageReserved). The above explains why KVM needs to skip the refcounting on PageReserved == true && pfn_valid() == true, and it must skip the refcounting for pfn_valid == false

Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled

2013-07-29 Thread Andrea Arcangeli
Hi, On Sat, Jul 27, 2013 at 07:47:49AM +, Zhanghaoyu (A) wrote: > >> hi all, > >> > >> I met similar problem to these, while performing live migration or > >> save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, > >> guest:suse11sp2), running tele-communication software suite

Re: mmapping physical memory

2013-08-26 Thread Andrea Arcangeli
Hi Anatoly, On Mon, Aug 26, 2013 at 12:58:25PM +0100, Anatoly Burakov wrote: > Hi all > > I am using IVSHMEM to mmap /dev/mem into guest. The mmap works fine on > QEMU without KVM support enabled, but with KVM i get kernel errors: > > * (with EPT enabled) > > [ 746.

Re: [PATCH 08/10] userfaultfd: add new syscall to provide memory externalization

2014-07-03 Thread Andrea Arcangeli
Hi Andy, thanks for CC'ing linux-api. On Wed, Jul 02, 2014 at 06:56:03PM -0700, Andy Lutomirski wrote: > On 07/02/2014 09:50 AM, Andrea Arcangeli wrote: > > Once an userfaultfd is created MADV_USERFAULT regions talks through > > the userfaultfd protocol with the thread respo

Re: [PATCH v2] kvm: Faults which trigger IO release the mmap_sem

2014-09-25 Thread Andrea Arcangeli
Hi Andres, On Wed, Sep 17, 2014 at 10:51:48AM -0700, Andres Lagar-Cavilla wrote: > + if (!locked) { > + VM_BUG_ON(npages != -EBUSY); > + Shouldn't this be VM_BUG_ON(npages)? Alternatively we could patch gup to do: case -EHWPOISON: +

Re: [PATCH] kvm: Fix kvm_get_page_retry_io __gup retval check

2014-09-25 Thread Andrea Arcangeli
On Thu, Sep 25, 2014 at 03:26:50PM -0700, Andres Lagar-Cavilla wrote: > Confusion around -EBUSY and zero (inside a BUG_ON no less). > > Reported-by: AndreA Arcangeli > Signed-off-by: Andres Lagar-Cavilla > --- > virt/kvm/kvm_main.c | 2 +- > 1 file changed, 1 ins

RFC: get_user_pages_locked|unlocked to leverage VM_FAULT_RETRY

2014-09-26 Thread Andrea Arcangeli
ocked parameter) will not invoke the userfaultfd protocol. But I need gup_fast to use FAULT_FLAG_ALLOW_RETRY because core places like O_DIRECT uses it. I tried to do a RFC patch below that goes into this direction and should be enough for a start to solve all my issues with the mmap_sem holding

Re: RFC: get_user_pages_locked|unlocked to leverage VM_FAULT_RETRY

2014-09-28 Thread Andrea Arcangeli
On Fri, Sep 26, 2014 at 12:54:46PM -0700, Andres Lagar-Cavilla wrote: > On Fri, Sep 26, 2014 at 10:25 AM, Andrea Arcangeli > wrote: > > On Thu, Sep 25, 2014 at 02:50:29PM -0700, Andres Lagar-Cavilla wrote: > >> It's nearly impossible to name it right beca

[PATCH 2/4] mm: gup: add get_user_pages_locked and get_user_pages_unlocked

2014-10-01 Thread Andrea Arcangeli
ent->mm. get_user_pages_unlocked varies from get_user_pages_fast only if mm is not current->mm (like when get_user_pages works on some other process mm). Whenever tsk and mm matches current and current->mm get_user_pages_fast must always be used to increase performance and get the page

[PATCH 1/4] mm: gup: add FOLL_TRIED

2014-10-01 Thread Andrea Arcangeli
From: Andres Lagar-Cavilla Reviewed-by: Radim Krčmář Signed-off-by: Andres Lagar-Cavilla Signed-off-by: Andrea Arcangeli --- include/linux/mm.h | 1 + mm/gup.c | 4 2 files changed, 5 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 8981cc8..0f4196a

[PATCH 0/4] leverage FAULT_FOLL_ALLOW_RETRY in get_user_pages

2014-10-01 Thread Andrea Arcangeli
serfaultfd backed memory. Reviews would be welcome, thanks, Andrea Andrea Arcangeli (3): mm: gup: add get_user_pages_locked and get_user_pages_unlocked mm: gup: use get_user_pages_fast and get_user_pages_unlocked mm: gup: use get_user_pages_unlocked within get_user_pages_fast Andres Laga

[PATCH 4/4] mm: gup: use get_user_pages_unlocked within get_user_pages_fast

2014-10-01 Thread Andrea Arcangeli
Signed-off-by: Andrea Arcangeli --- arch/mips/mm/gup.c | 8 +++- arch/powerpc/mm/gup.c| 6 ++ arch/s390/kvm/kvm-s390.c | 4 +--- arch/s390/mm/gup.c | 6 ++ arch/sh/mm/gup.c | 6 ++ arch/sparc/mm/gup.c | 6 ++ arch/x86/mm/gup.c| 7

[PATCH 3/4] mm: gup: use get_user_pages_fast and get_user_pages_unlocked

2014-10-01 Thread Andrea Arcangeli
Just an optimization. Signed-off-by: Andrea Arcangeli --- drivers/dma/iovlock.c | 10 ++ drivers/iommu/amd_iommu_v2.c | 6 ++ drivers/media/pci/ivtv/ivtv-udma.c | 6 ++ drivers/misc/sgi-gru/grufault.c| 3 +-- drivers/scsi/st.c | 10

Re: [PATCH 3/4] mm: gup: use get_user_pages_fast and get_user_pages_unlocked

2014-10-01 Thread Andrea Arcangeli
On Wed, Oct 01, 2014 at 10:56:36AM +0200, Andrea Arcangeli wrote: > diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c > index f74fc0c..cd20669 100644 > --- a/drivers/misc/sgi-gru/grufault.c > +++ b/drivers/misc/sgi-gru/grufault.c > @@ -198,8 +198,

Re: RFC: get_user_pages_locked|unlocked to leverage VM_FAULT_RETRY

2014-10-02 Thread Andrea Arcangeli
On Wed, Oct 01, 2014 at 05:36:11PM +0200, Peter Zijlstra wrote: > For all these and the other _fast() users, is there an actual limit to > the nr_pages passed in? Because we used to have the 64 pages limit from > DIO, but without that we get rather long IRQ-off latencies. Ok, I would tend to think

Re: [PATCH 2/4] mm: gup: add get_user_pages_locked and get_user_pages_unlocked

2014-10-02 Thread Andrea Arcangeli
On Wed, Oct 01, 2014 at 10:06:27AM -0700, Andres Lagar-Cavilla wrote: > On Wed, Oct 1, 2014 at 8:51 AM, Peter Feiner wrote: > > On Wed, Oct 01, 2014 at 10:56:35AM +0200, Andrea Arcangeli wrote: > >> + /* VM_FAULT_RETRY cannot return errors */ > >>

Re: RFC: get_user_pages_locked|unlocked to leverage VM_FAULT_RETRY

2014-10-02 Thread Andrea Arcangeli
On Thu, Oct 02, 2014 at 02:56:38PM +0200, Peter Zijlstra wrote: > On Thu, Oct 02, 2014 at 02:50:52PM +0200, Peter Zijlstra wrote: > > On Thu, Oct 02, 2014 at 02:31:17PM +0200, Andrea Arcangeli wrote: > > > On Wed, Oct 01, 2014 at 05:36:11PM +0200, Peter Zijlstra wrote: > &g

[PATCH 11/17] mm: swp_entry_swapcount

2014-10-03 Thread Andrea Arcangeli
in some anon_vma. Signed-off-by: Andrea Arcangeli --- include/linux/swap.h | 6 ++ mm/swapfile.c| 13 + 2 files changed, 19 insertions(+) diff --git a/include/linux/swap.h b/include/linux/swap.h index 8197452..af9977c 100644 --- a/include/linux/swap.h +++ b/include/linux

[PATCH 16/17] powerpc: add remap_anon_pages and userfaultfd

2014-10-03 Thread Andrea Arcangeli
Add the syscall numbers. Signed-off-by: Andrea Arcangeli --- arch/powerpc/include/asm/systbl.h | 2 ++ arch/powerpc/include/asm/unistd.h | 2 +- arch/powerpc/include/uapi/asm/unistd.h | 2 ++ 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm

[PATCH 15/17] userfaultfd: make userfaultfd_write non blocking

2014-10-03 Thread Andrea Arcangeli
same address. But we should still return an error so if the application thinks this occurrence can never happen it will know it hit a bug. So just return -ENOENT instead of blocking. Signed-off-by: Andrea Arcangeli --- fs/userfaultfd.c | 34 +- 1 file changed, 5 inser

[PATCH 13/17] waitqueue: add nr wake parameter to __wake_up_locked_key

2014-10-03 Thread Andrea Arcangeli
Userfaultfd needs to wake all waitqueues (pass 0 as nr parameter), instead of the current hardcoded 1 (that would wake just the first waitqueue in the head list). Signed-off-by: Andrea Arcangeli --- include/linux/wait.h | 5 +++-- kernel/sched/wait.c | 7 --- net/sunrpc/sched.c | 2 +- 3

[PATCH 03/17] mm: gup: use get_user_pages_unlocked within get_user_pages_fast

2014-10-03 Thread Andrea Arcangeli
Signed-off-by: Andrea Arcangeli --- arch/mips/mm/gup.c | 8 +++- arch/powerpc/mm/gup.c| 6 ++ arch/s390/kvm/kvm-s390.c | 4 +--- arch/s390/mm/gup.c | 6 ++ arch/sh/mm/gup.c | 6 ++ arch/sparc/mm/gup.c | 6 ++ arch/x86/mm/gup.c| 7

[PATCH 14/17] userfaultfd: add new syscall to provide memory externalization

2014-10-03 Thread Andrea Arcangeli
userfaults to read (POLLIN) and when there are threads waiting a wakeup through a range write (POLLOUT). Signed-off-by: Andrea Arcangeli --- arch/x86/syscalls/syscall_32.tbl | 1 + arch/x86/syscalls/syscall_64.tbl | 1 + fs/Makefile | 1 + fs/userfaultfd.c

[PATCH 01/17] mm: gup: add FOLL_TRIED

2014-10-03 Thread Andrea Arcangeli
From: Andres Lagar-Cavilla Reviewed-by: Radim Krčmář Signed-off-by: Andres Lagar-Cavilla Signed-off-by: Andrea Arcangeli --- include/linux/mm.h | 1 + mm/gup.c | 4 2 files changed, 5 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 8981cc8..0f4196a

[PATCH 06/17] kvm: Faults which trigger IO release the mmap_sem

2014-10-03 Thread Andrea Arcangeli
, as other mmap semaphore users now stall as a function of swap or filemap latency. This patch ensures both the regular and async PF path re-enter the fault allowing for the mmap semaphore to be relinquished in the case of IO wait. Reviewed-by: Radim Krčmář Signed-off-by: Andres Lagar-Cavilla Signed

[PATCH 04/17] mm: gup: make get_user_pages_fast and __get_user_pages_fast latency conscious

2014-10-03 Thread Andrea Arcangeli
using get_user_pages_unlocked which would be slower). Signed-off-by: Andrea Arcangeli --- arch/x86/mm/gup.c | 234 ++ 1 file changed, 149 insertions(+), 85 deletions(-) diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c index 2ab183b..917d8c1 100644 --- a/arch/x

[PATCH 00/17] RFC: userfault v2

2014-10-03 Thread Andrea Arcangeli
ges should do it fine too, but it would create rmap nonlinearity which isn't optimal. The code can be found here: git clone --reference linux git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git -b userfault The branch is rebased so you can get updates for example with: git fetch &am

[PATCH 09/17] mm: PT lock: export double_pt_lock/unlock

2014-10-03 Thread Andrea Arcangeli
Those two helpers are needed by remap_anon_pages. Signed-off-by: Andrea Arcangeli --- include/linux/mm.h | 4 mm/fremap.c| 29 + 2 files changed, 33 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index bf3df07..71dbe03 100644 --- a

[PATCH 07/17] mm: madvise MADV_USERFAULT: prepare vm_flags to allow more than 32bits

2014-10-03 Thread Andrea Arcangeli
We run out of 32bits in vm_flags, noop change for 64bit archs. Signed-off-by: Andrea Arcangeli --- fs/proc/task_mmu.c | 4 ++-- include/linux/huge_mm.h | 4 ++-- include/linux/ksm.h | 4 ++-- include/linux/mm_types.h | 2 +- mm/huge_memory.c | 2 +- mm/ksm.c

[PATCH 10/17] mm: rmap preparation for remap_anon_pages

2014-10-03 Thread Andrea Arcangeli
p_anon_pages runs. Signed-off-by: Andrea Arcangeli --- mm/huge_memory.c | 24 mm/rmap.c| 9 + 2 files changed, 29 insertions(+), 4 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b402d60..4277ed7 100644 --- a/mm/huge_memory.c +++ b/mm

[PATCH 05/17] mm: gup: use get_user_pages_fast and get_user_pages_unlocked

2014-10-03 Thread Andrea Arcangeli
Just an optimization. Signed-off-by: Andrea Arcangeli --- drivers/dma/iovlock.c | 10 ++ drivers/iommu/amd_iommu_v2.c | 6 ++ drivers/media/pci/ivtv/ivtv-udma.c | 6 ++ drivers/scsi/st.c | 10 ++ drivers/video/fbdev/pvr2fb.c

[PATCH 02/17] mm: gup: add get_user_pages_locked and get_user_pages_unlocked

2014-10-03 Thread Andrea Arcangeli
ent->mm. get_user_pages_unlocked varies from get_user_pages_fast only if mm is not current->mm (like when get_user_pages works on some other process mm). Whenever tsk and mm matches current and current->mm get_user_pages_fast must always be used to increase performance and get the page loc

[PATCH 17/17] userfaultfd: implement USERFAULTFD_RANGE_REGISTER|UNREGISTER

2014-10-03 Thread Andrea Arcangeli
er process that is calling ptrace). We could also decide to retain the current -EFAULT behavior of ptrace using get_user_pages_locked with a NULL locked parameter so the FAULT_FLAG_ALLOW_RETRY flag will not be set. Either ways would be safe. Signed-off-by: Andrea Arcangeli ---

[PATCH 12/17] mm: sys_remap_anon_pages

2014-10-03 Thread Andrea Arcangeli
write MADV_USERFAULT */ c[i+1] = 0xbb; } if (c[i] != 0xaa) printf("error %x offset %lu\n", c[i], i), exit(1); } printf("remap_anon_pages functions correctly\n"); return 0; } === Signe

[PATCH 08/17] mm: madvise MADV_USERFAULT

2014-10-03 Thread Andrea Arcangeli
exclusive if set. Signed-off-by: Andrea Arcangeli --- arch/alpha/include/uapi/asm/mman.h | 3 ++ arch/mips/include/uapi/asm/mman.h | 3 ++ arch/parisc/include/uapi/asm/mman.h| 3 ++ arch/xtensa/include/uapi/asm/mman.h| 3 ++ fs/proc/task_mmu.c | 1

Re: [PATCH 04/17] mm: gup: make get_user_pages_fast and __get_user_pages_fast latency conscious

2014-10-06 Thread Andrea Arcangeli
Hello, On Fri, Oct 03, 2014 at 11:23:53AM -0700, Linus Torvalds wrote: > On Fri, Oct 3, 2014 at 10:07 AM, Andrea Arcangeli wrote: > > This teaches gup_fast and __gup_fast to re-enable irqs and > > cond_resched() if possible every BATCH_PAGES. > > This is disgusting. > &

Re: [PATCH 10/17] mm: rmap preparation for remap_anon_pages

2014-10-06 Thread Andrea Arcangeli
Hello, On Mon, Oct 06, 2014 at 09:55:41AM +0100, Dr. David Alan Gilbert wrote: > * Linus Torvalds (torva...@linux-foundation.org) wrote: > > On Fri, Oct 3, 2014 at 10:08 AM, Andrea Arcangeli > > wrote: > > > > > > Overall this looks a fairly small change to

Re: [PATCH 08/17] mm: madvise MADV_USERFAULT

2014-10-06 Thread Andrea Arcangeli
Hi, On Sat, Oct 04, 2014 at 08:13:36AM +0900, Mike Hommey wrote: > On Fri, Oct 03, 2014 at 07:07:58PM +0200, Andrea Arcangeli wrote: > > MADV_USERFAULT is a new madvise flag that will set VM_USERFAULT in the > > vma flags. Whenever VM_USERFAULT is set in an anonymous vma, if >

Re: [PATCH 08/17] mm: madvise MADV_USERFAULT

2014-10-07 Thread Andrea Arcangeli
Hi Kirill, On Tue, Oct 07, 2014 at 01:36:45PM +0300, Kirill A. Shutemov wrote: > On Fri, Oct 03, 2014 at 07:07:58PM +0200, Andrea Arcangeli wrote: > > MADV_USERFAULT is a new madvise flag that will set VM_USERFAULT in the > > vma flags. Whenever VM_USERFAULT is set in an an

Re: [Qemu-devel] [PATCH 10/17] mm: rmap preparation for remap_anon_pages

2014-10-07 Thread Andrea Arcangeli
Hi Kirill, On Tue, Oct 07, 2014 at 02:10:26PM +0300, Kirill A. Shutemov wrote: > On Fri, Oct 03, 2014 at 07:08:00PM +0200, Andrea Arcangeli wrote: > > There's one constraint enforced to allow this simplification: the > > source pages passed to remap_anon_pages must be mapped

Re: [PATCH 10/17] mm: rmap preparation for remap_anon_pages

2014-10-07 Thread Andrea Arcangeli
Hello, On Tue, Oct 07, 2014 at 08:47:59AM -0400, Linus Torvalds wrote: > On Mon, Oct 6, 2014 at 12:41 PM, Andrea Arcangeli wrote: > > > > Of course if somebody has better ideas on how to resolve an anonymous > > userfault they're welcome. > > So I'd

Re: [PATCH 10/17] mm: rmap preparation for remap_anon_pages

2014-10-07 Thread Andrea Arcangeli
On Tue, Oct 07, 2014 at 04:19:13PM +0200, Andrea Arcangeli wrote: > mremap like interface, or file+commands protocol interface. I tend to > like mremap more, that's why I opted for a remap_anon_pages syscall > kept orthogonal to the userfaultfd functionality (remap_anon_pages > c

Re: [PATCH 3/4] mm: gup: use get_user_pages_fast and get_user_pages_unlocked

2014-10-12 Thread Andrea Arcangeli
On Thu, Oct 09, 2014 at 12:52:45PM +0200, Peter Zijlstra wrote: > On Wed, Oct 01, 2014 at 10:56:36AM +0200, Andrea Arcangeli wrote: > > Just an optimization. > > Does it make sense to split the thing in two? One where you apply > _unlocked and then one where you apply _fast?

Re: [PATCH 2/4] mm: gup: add get_user_pages_locked and get_user_pages_unlocked

2014-10-29 Thread Andrea Arcangeli
On Thu, Oct 09, 2014 at 12:47:23PM +0200, Peter Zijlstra wrote: > On Wed, Oct 01, 2014 at 10:56:35AM +0200, Andrea Arcangeli wrote: > > +static inline long __get_user_pages_locked(struct task_struct *tsk, > > + struc

Re: [PATCH 2/4] mm: gup: add get_user_pages_locked and get_user_pages_unlocked

2014-10-29 Thread Andrea Arcangeli
On Thu, Oct 09, 2014 at 12:50:37PM +0200, Peter Zijlstra wrote: > On Wed, Oct 01, 2014 at 10:56:35AM +0200, Andrea Arcangeli wrote: > > > +static inline long __get_user_pages_locked(struct task_struct *tsk, > > + st

Re: [PATCH 00/17] RFC: userfault v2

2014-10-29 Thread Andrea Arcangeli
Hi Zhanghailiang, On Mon, Oct 27, 2014 at 05:32:51PM +0800, zhanghailiang wrote: > Hi Andrea, > > Thanks for your hard work on userfault;) > > This is really a useful API. > > I want to confirm a question: > Can we support distinguishing between writing and reading memory for > userfault? > Th

Re: [PATCH 00/17] RFC: userfault v2

2014-11-19 Thread Andrea Arcangeli
Hi Zhang, On Fri, Oct 31, 2014 at 09:26:09AM +0800, zhanghailiang wrote: > On 2014/10/30 20:49, Dr. David Alan Gilbert wrote: > > * zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: > >> On 2014/10/30 1:46, Andrea Arcangeli wrote: > >>> Hi Zhanghailiang, > >

Re: [PATCH 00/17] RFC: userfault v2

2014-11-20 Thread Andrea Arcangeli
Hi, On Fri, Oct 31, 2014 at 12:39:32PM -0700, Peter Feiner wrote: > On Fri, Oct 31, 2014 at 11:29:49AM +0800, zhanghailiang wrote: > > Agreed, but for doing live memory snapshot (VM is running when do > > snapsphot), > > we have to do this (block the write action), because we have to save the >

Re: [PATCH 00/17] RFC: userfault v2

2014-11-20 Thread Andrea Arcangeli
Hi, On Thu, Nov 20, 2014 at 10:54:29AM +0800, zhanghailiang wrote: > Yes, you are right. This is what i really want, bypass all non-present faults > and only track strict wrprotect faults. ;) > > So, do you plan to support that in the userfault API? Yes I think it's good idea to support wrprotec

Re: [Qemu-devel] [PATCH 00/17] RFC: userfault v2

2014-11-21 Thread Andrea Arcangeli
Hi Peter, On Wed, Oct 29, 2014 at 05:56:59PM +, Peter Maydell wrote: > On 29 October 2014 17:46, Andrea Arcangeli wrote: > > After some chat during the KVMForum I've been already thinking it > > could be beneficial for some usage to give userland the information >

Re: [Qemu-devel] [PATCH 00/17] RFC: userfault v2

2014-11-25 Thread Andrea Arcangeli
On Fri, Nov 21, 2014 at 11:05:45PM +, Peter Maydell wrote: > If it's mapped and readable-but-not-writable then it should still > fault on write accesses, though? These are cases we currently get > SEGV for, anyway. Yes then it'll work just fine. > Ah, I guess we have a terminology difference.

Re: problems with 1G hugepages and linux 3.12-rc3

2013-10-09 Thread Andrea Arcangeli
lized. I put this just after the other __SetPage... so that we load the cacheline just once, so it should be zero cost to initialize PG_reserved properly. == >From 952d474fae6dc42ece4b05ce1f1489c86da2a268 Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli Date: Thu, 10 Oc

[PATCH] mm: hugetlb: initialize PG_reserved for tail pages of gigantig compound pages

2013-10-10 Thread Andrea Arcangeli
as already modified in order to set PG_tail so this won't affect the boot time of large memory systems. Reported-by: andy123 Signed-off-by: Andrea Arcangeli --- mm/hugetlb.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.

[PATCH] initialize PG_reserved for tail pages of gigantig compound pages

2013-10-10 Thread Andrea Arcangeli
patch 11feeb498086a3a5907b8148bdf1786a9b18fc55. Enforcing PG_reserved not set for tail pages of hugetlbfs gigantic compound pages sounds safer regardless of commit 11feeb498086a3a5907b8148bdf1786a9b18fc55 to be consistent with the other hugetlbfs page sizes (i.e hugetlbfs page order < MAX_ORDER). Thanks, Andrea Andrea Arca

  1   2   3   4   >