[PATCH v8 14/14] powerpc/vas: Free send window in VAS instance after credits returned

2020-03-18 Thread Haren Myneni
NX may be processing requests while trying to close window. Wait until all credits are returned and then free send window from VAS instance. Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-window.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/

[PATCH v8 13/14] powerpc/vas: Display process stuck message

2020-03-18 Thread Haren Myneni
Process can not close send window until all requests are processed. Means wait until window state is not busy and send credits are returned. Display debug messages in case taking longer to close the window. Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-window.c | 28 ++

[PATCH v8 12/14] powerpc/vas: Return credits after handling fault

2020-03-18 Thread Haren Myneni
NX expects OS to return credit for send window after processing each fault. Also credit has to be returned even for fault window. Signed-off-by: Sukadev Bhattiprolu Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-fault.c | 9 + arch/powerpc/platforms/powernv/vas-w

[PATCH v8 11/14] powerpc/vas: Do not use default credits for receive window

2020-03-18 Thread Haren Myneni
System checkstops if RxFIFO overruns with more requests than the maximum possible number of CRBs allowed in FIFO at any time. So max credits value (rxattr.wcreds_max) is set and is passed to vas_rx_win_open() by the the driver. Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas

[PATCH v8 10/14] powerpc/vas: Print CRB and FIFO values

2020-03-18 Thread Haren Myneni
Dump FIFO entries if could not find send window and print CRB for debugging. Signed-off-by: Sukadev Bhattiprolu Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-fault.c | 41 ++ 1 file changed, 41 insertions(+) diff --git a/arch/powerpc/platforms

[PATCH v8 09/14] powerpc/vas: Update CSB and notify process for fault CRBs

2020-03-18 Thread Haren Myneni
For each fault CRB, update fault address in CRB (fault_storage_addr) and translation error status in CSB so that user space can touch the fault address and resend the request. If the user space passed invalid CSB address send signal to process with SIGSEGV. Signed-off-by: Sukadev Bhattiprolu Si

[PATCH v8 08/14] powerpc/vas: Take reference to PID and mm for user space windows

2020-03-18 Thread Haren Myneni
When process opens a window, its pid and tgid will be saved in vas_window struct. This window will be closed when the process exits. Kernel handles NX faults by updating CSB or send SEGV signal to pid if user space csb_addr is invalid. In multi-thread applications, a window can be opened by chil

[PATCH v8 07/14] powerpc/vas: Register NX with fault window ID and IRQ port value

2020-03-18 Thread Haren Myneni
For each user space send window, register NX with fault window ID and port value so that NX paste CRBs in this fault FIFO when it sees fault on the request buffer. Signed-off-by: Sukadev Bhattiprolu Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-window.c | 15 +

[PATCH v8 06/14] powerpc/vas: Setup thread IRQ handler per VAS instance

2020-03-18 Thread Haren Myneni
Setup thread IRQ handler per each VAS instance. When NX sees a fault on CRB, kernel gets an interrupt and vas_fault_handler will be executed to process fault CRBs. Read all valid CRBs from fault FIFO, determine the corresponding send window from CRB and process fault requests. Signed-off-by: Suk

[PATCH v8 05/14] powerpc/vas: Setup fault window per VAS instance

2020-03-18 Thread Haren Myneni
Setup fault window for each VAS instance. When NX gets a fault on request buffer, write fault CRBs in the corresponding fault FIFO and then sends an interrupt to the OS. Signed-off-by: Sukadev Bhattiprolu Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/Makefile | 2 +- arc

[PATCH v8 04/14] powerpc/vas: Alloc and setup IRQ and trigger port address

2020-03-18 Thread Haren Myneni
Alloc IRQ and get trigger port address for each VAS instance. Kernel register this IRQ per VAS instance and sets this port for each send window. NX interrupts the kernel when it sees page fault. Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas.c | 34 +

[PATCH v8 03/14] powerpc/vas: Define nx_fault_stamp in coprocessor_request_block

2020-03-18 Thread Haren Myneni
Kernel sets fault address and status in CRB for NX page fault on user space address after processing page fault. User space gets the signal and handles the fault mentioned in CRB by bringing the page in to memory and send NX request again. Signed-off-by: Sukadev Bhattiprolu Signed-off-by: Haren

[PATCH v8 02/14] powerpc/xive: Define xive_native_alloc_get_irq_info()

2020-03-18 Thread Haren Myneni
pnv_ocxl_alloc_xive_irq() in ocxl.c allocates IRQ and gets trigger port address. VAS also needs this function, but based on chip ID. So moved this common function to xive/native.c. Signed-off-by: Haren Myneni --- arch/powerpc/include/asm/xive.h | 2 ++ arch/powerpc/platforms/powernv/ocx

[PATCH v8 01/14] powerpc/xive: Define xive_native_alloc_irq_on_chip()

2020-03-18 Thread Haren Myneni
This function allocates IRQ on a specific chip. VAS needs per chip IRQ allocation and will have IRQ handler per VAS instance. Signed-off-by: Haren Myneni --- arch/powerpc/include/asm/xive.h | 9 - arch/powerpc/sysdev/xive/native.c | 6 +++--- 2 files changed, 11 insertions(+), 4 dele

[PATCH v8 00/14] powerpc/vas: Page fault handling for user space NX requests

2020-03-18 Thread Haren Myneni
On power9, Virtual Accelerator Switchboard (VAS) allows user space or kernel to communicate with Nest Accelerator (NX) directly using COPY/PASTE instructions. NX provides various functionalities such as compression, encryption and etc. But only compression (842 and GZIP formats) is supported in L

Re: [PATCH v4] powerpc: setup_64: set up PACA before parsing device tree

2020-03-18 Thread Michael Ellerman
Daniel Axtens writes: > Currently, we set up the PACA after parsing the device tree for CPU > features. Before that, r13 contains random data, which means there is > random data in r13 while we're running the generic dt parsing code. > > This random data varies depending on whether we boot through

Re: [RFC PATCH] powerpc/64s: CONFIG_PPC_HASH_MMU

2020-03-18 Thread Michael Ellerman
Nicholas Piggin writes: > This allows the 64s hash MMU code to be compiled out if radix is > selected. This saves about 128kB kernel image size (90kB text) on > powernv_defconfig minus KVM, 40kB on a tiny config. TBH my feelings are: - the size savings don't excite me much, given our kernels can

Re: [PATCHv2 26/50] powerpc: Add show_stack_loglvl()

2020-03-18 Thread Michael Ellerman
Dmitry Safonov writes: > Currently, the log-level of show_stack() depends on a platform > realization. It creates situations where the headers are printed with > lower log level or higher than the stacktrace (depending on > a platform or user). > > Furthermore, it forces the logic decision from us

Re: [PATCH v5 2/2] powerpc/64: Prevent stack protection in early boot

2020-03-18 Thread Michael Ellerman
Daniel Axtens writes: > Michael Ellerman writes: > >> The previous commit reduced the amount of code that is run before we >> setup a paca. However there are still a few remaining functions that >> run with no paca, or worse, with an arbitrary value in r13 that will >> be used as a paca pointer.

[PATCH v2 22/22] powerpc/mm/book3s64: Fix MADV_DONTNEED and parallel page fault race

2020-03-18 Thread Aneesh Kumar K.V
MADV_DONTNEED holds mmap_sem in read mode and that implies a parallel page fault is possible and the kernel can end up with a level 1 PTE entry (THP entry) converted to a level 0 PTE entry without flushing the THP TLB entry. Most architectures including POWER have issues with kernel instantiating

[PATCH v2 21/22] mm: change pmdp_huge_get_and_clear_full take vm_area_struct as arg

2020-03-18 Thread Aneesh Kumar K.V
We will use this in later patch to do tlb flush when clearing pmd entries. Signed-off-by: Aneesh Kumar K.V --- arch/s390/include/asm/pgtable.h | 4 ++-- include/asm-generic/pgtable.h | 4 ++-- mm/huge_memory.c| 4 ++-- 3 files changed, 6 insertions(+), 6 deletions(-) diff --gi

[PATCH v2 20/22] powerpc/mm/book3s64: Avoid sending IPI on clearing PMD

2020-03-18 Thread Aneesh Kumar K.V
Now that all the lockless page table walk is careful w.r.t the PTE address returned, we can now revert commit: 13bd817bb884 ("powerpc/thp: Serialize pmd clear against a linux page table walk.") We also drop the equivalent IPI from other pte updates routines. We still keep IPI in hash pmdp collaps

[PATCH v2 19/22] powerpc/kvm/book3s: Use pte_present instead of opencoding _PAGE_PRESENT check

2020-03-18 Thread Aneesh Kumar K.V
This adds _PAGE_PTE check and makes sure we validate the pte value returned via find_kvm_host_pte. NOTE: this also considers _PAGE_INVALID to the software valid bit. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/kvm_book3s_64.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-

[PATCH v2 18/22] powerpc/kvm/book3s: Use find_kvm_host_pte in kvmppc_get_hpa

2020-03-18 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 32 ++--- 1 file changed, 11 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 83e987fecf97..3b168c69d503 100644 --- a/arch

[PATCH v2 17/22] powerpc/kvm/book3s: use find_kvm_host_pte in kvmppc_book3s_instantiate_page

2020-03-18 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/kvm/book3s_64_mmu_radix.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index f0b021052e33..fae89e3dbee0 100644 --- a/arch/powerpc/kvm

[PATCH v2 16/22] powerpc/kvm/book3s: Avoid using rmap to protect parallel page table update.

2020-03-18 Thread Aneesh Kumar K.V
We now depend on kvm->mmu_lock Cc: Alexey Kardashevskiy Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/kvm/book3s_64_vio_hv.c | 38 +++-- 1 file changed, 9 insertions(+), 29 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vi

[PATCH v2 14/22] powerpc/kvm/book3s: Use find_kvm_host_pte in h_enter

2020-03-18 Thread Aneesh Kumar K.V
Since kvmppc_do_h_enter can get called in realmode use low level arch_spin_lock which is safe to be called in realmode. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 5 ++--- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 22 ++ 2 files changed, 8 insertio

[PATCH v2 15/22] powerpc/kvm/book3s: use find_kvm_host_pte in pute_tce functions

2020-03-18 Thread Aneesh Kumar K.V
Current code just hold rmap lock to ensure parallel page table update is prevented. That is not sufficient. The kernel should also check whether a mmu_notifer callback was running in parallel. Cc: Alexey Kardashevskiy Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/kvm/book3s_64_vio_hv.c | 30

[PATCH v2 12/22] powerpc/kvm/book3s: Add helper for host page table walk

2020-03-18 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/kvm_book3s_64.h | 16 1 file changed, 16 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 2860521992b6..1ca1f6495012 100644 --- a/arch/powerpc/includ

[PATCH v2 13/22] powerpc/kvm/book3s: Use find_kvm_host_pte in page fault handler

2020-03-18 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 6c372f5c61b6..fbabdcf24c86 100644 --- a/arch/powerpc/kvm/book3s_64_mm

[PATCH v2 11/22] powerpc/kvm/book3s: Use kvm helpers to walk shadow or secondary table

2020-03-18 Thread Aneesh Kumar K.V
update kvmppc_hv_handle_set_rc to use find_kvm_nested_guest_pte and find_kvm_secondary_pte Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/kvm_book3s.h| 2 +- arch/powerpc/include/asm/kvm_book3s_64.h | 3 +++ arch/powerpc/kvm/book3s_64_mmu_radix.c | 18 +- ar

[PATCH v2 10/22] powerpc/kvm/nested: Add helper to walk nested shadow linux page table.

2020-03-18 Thread Aneesh Kumar K.V
The locking rules for walking nested shadow linux page table is different from process scoped table. Hence add a helper for nested page table walk and also add check whether we are holding the right locks. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/kvm/book3s_hv_nested.c | 28

[PATCH v2 09/22] powerpc/kvm/book3s: Add helper to walk partition scoped linux page table.

2020-03-18 Thread Aneesh Kumar K.V
The locking rules for walking partition scoped table is different from process scoped table. Hence add a helper for secondary linux page table walk and also add check whether we are holding the right locks. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/kvm_book3s_64.h | 13 +++

[PATCH v2 08/22] powerpc/kvm/book3s: switch from raw_spin_*lock to arch_spin_lock.

2020-03-18 Thread Aneesh Kumar K.V
These functions can get called in realmode. Hence use low level arch_spin_lock which is safe to be called in realmode. Cc: Suraj Jitindar Singh Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arc

[PATCH v2 07/22] powerpc/perf/callchain: Use __get_user_pages_fast in read_user_stack_slow

2020-03-18 Thread Aneesh Kumar K.V
read_user_stack_slow is called with interrupts soft disabled and it copies contents from the page which we find mapped to a specific address. To convert userspace address to pfn, the kernel now uses lockless page table walk. The kernel needs to make sure the pfn value read remains stable and is n

[PATCH v2 06/22] powerpc/mce: Don't reload pte val in addr_to_pfn

2020-03-18 Thread Aneesh Kumar K.V
A lockless page table walk should be safe against parallel THP collapse, THP split and madvise(MADV_DONTNEED)/parallel fault. This patch makes sure kernel won't reload the pteval when checking for different conditions. The patch also added a check for pte_present to make sure the kernel is indeed

[PATCH v2 05/22] powerpc/book3s64/hash: Use the pte_t address from the caller

2020-03-18 Thread Aneesh Kumar K.V
Don't fetch the pte value using lockless page table walk. Instead use the value from the caller. hash_preload is called with ptl lock held. So it is safe to use the pte_t address directly. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/mm/book3s64/hash_utils.c | 27 +--

[PATCH v2 04/22] powerpc/hash64: Restrict page table lookup using init_mm with __flush_hash_table_range

2020-03-18 Thread Aneesh Kumar K.V
This is only used with init_mm currently. Walking init_mm is much simpler because we don't need to handle concurrent page table like other mm_context Signed-off-by: Aneesh Kumar K.V --- .../include/asm/book3s/64/tlbflush-hash.h| 3 +-- arch/powerpc/kernel/pci_64.c |

[PATCH v2 03/22] powerpc/mm/hash64: use _PAGE_PTE when checking for pte_present

2020-03-18 Thread Aneesh Kumar K.V
This makes the pte_present check stricter by checking for additional _PAGE_PTE bit. A level 1 pte pointer (THP pte) can be switched to a pointer to level 0 pte page table page by following two operations. 1) THP split. 2) madvise(MADV_DONTNEED) in parallel to page fault. A lockless page table wal

[PATCH v2 02/22] powerpc/pkeys: Check vma before returning key fault error to the user

2020-03-18 Thread Aneesh Kumar K.V
If multiple threads in userspace keep changing the protection keys mapping a range, there can be a scenario where kernel takes a key fault but the pkey value found in the siginfo struct is a permissive one. This can confuse the userspace as shown in the below test case. /* use this to control the

[PATCH v2 01/22] powerpc/pkeys: Avoid using lockless page table walk

2020-03-18 Thread Aneesh Kumar K.V
Fetch pkey from vma instead of linux page table. Also document the fact that in some cases the pkey returned in siginfo won't be the same as the one we took keyfault on. Even with linux page table walk, we can end up in a similar scenario. Cc: Ram Pai Signed-off-by: Aneesh Kumar K.V --- arch/p

[PATCH v2 00/22] Avoid IPI while updating page table entries.

2020-03-18 Thread Aneesh Kumar K.V
Problem Summary: Slow termination of KVM guest with large guest RAM config due to a large number of IPIs that were caused by clearing level 1 PTE entries (THP) entries. This is shown in the stack trace below. - qemu-system-ppc [kernel.vmlinux][k] smp_call_function_many - smp_call_

Re: [PATCH 4/4] hugetlbfs: clean up command line processing

2020-03-18 Thread Mike Kravetz
On 3/18/20 5:20 PM, Randy Dunlap wrote: > Hi Mike, > > On 3/18/20 3:06 PM, Mike Kravetz wrote: >> With all hugetlb page processing done in a single file clean up code. >> - Make code match desired semantics >> - Update documentation with semantics >> - Make all warnings and errors messages start

Re: [PATCH v3 3/9] powerpc/vas: Add VAS user space API

2020-03-18 Thread Haren Myneni
On Thu, 2020-03-19 at 12:16 +1100, Daniel Axtens wrote: > Haren Myneni writes: > > > On power9, userspace can send GZIP compression requests directly to NX > > once kernel establishes NX channel / window with VAS. This patch provides > > user space API which allows user space to establish channel

Re: [PATCH 1/4] hugetlbfs: add arch_hugetlb_valid_size

2020-03-18 Thread kbuild test robot
Hi Mike, I love your patch! Yet something to improve: [auto build test ERROR on next-20200318] [also build test ERROR on v5.6-rc6] [cannot apply to arm64/for-next/core powerpc/next sparc/master linus/master sparc-next/master v5.6-rc6 v5.6-rc5 v5.6-rc4] [if your patch is applied to the wrong git

Re: [PATCH -next 016/491] KERNEL VIRTUAL MACHINE FOR POWERPC (KVM/powerpc): Use fallthrough;

2020-03-18 Thread Joe Perches
On Thu, 2020-03-19 at 12:18 +1100, Paul Mackerras wrote: > On Tue, Mar 10, 2020 at 09:51:30PM -0700, Joe Perches wrote: > > Convert the various uses of fallthrough comments to fallthrough; > > > > Done via script > > Link: > > https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.

Re: [PATCH -next 016/491] KERNEL VIRTUAL MACHINE FOR POWERPC (KVM/powerpc): Use fallthrough;

2020-03-18 Thread Paul Mackerras
On Tue, Mar 10, 2020 at 09:51:30PM -0700, Joe Perches wrote: > Convert the various uses of fallthrough comments to fallthrough; > > Done via script > Link: > https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe.com/ > > Signed-off-by: Joe Perches The subject

Re: [PATCH v3 3/9] powerpc/vas: Add VAS user space API

2020-03-18 Thread Daniel Axtens
Haren Myneni writes: > On power9, userspace can send GZIP compression requests directly to NX > once kernel establishes NX channel / window with VAS. This patch provides > user space API which allows user space to establish channel using open > VAS_TX_WIN_OPEN ioctl, mmap and close operations. >

Re: [PATCH v2 1/4] mm: Check for node_online in node_present_pages

2020-03-18 Thread Michael Ellerman
Michael Ellerman writes: > Vlastimil Babka writes: >> On 3/18/20 11:02 AM, Michal Hocko wrote: >>> On Wed 18-03-20 12:58:07, Srikar Dronamraju wrote: Calling a kmalloc_node on a possible node which is not yet onlined can lead to panic. Currently node_present_pages() doesn't verify the n

Re: [PATCH] powerpc/vdso: Fix multiple issues with sys_call_table

2020-03-18 Thread Michael Ellerman
Anton Blanchard writes: > The VDSO exports a bitmap of valid syscalls. vdso_setup_syscall_map() > sets this up, but there are both little and big endian bugs. The issue > is with: > >if (sys_call_table[i] != sys_ni_syscall) > > On little endian, instead of comparing pointers to the two fun

[PATCH v2] qtpm2: Export tpm2_get_cc_attrs_tbl for ibmvtpm driver as module

2020-03-18 Thread Stefan Berger
From: Stefan Berger This patch fixes the following problem when the ibmvtpm driver is built as a module: ERROR: modpost: "tpm2_get_cc_attrs_tbl" [drivers/char/tpm/tpm_ibmvtpm.ko] undefined! make[1]: *** [scripts/Makefile.modpost:94: __modpost] Error 1 make: *** [Makefile:1298: modules] Error 2

Re: [PATCH 1/4] hugetlbfs: add arch_hugetlb_valid_size

2020-03-18 Thread kbuild test robot
Hi Mike, I love your patch! Yet something to improve: [auto build test ERROR on next-20200318] [also build test ERROR on v5.6-rc6] [cannot apply to arm64/for-next/core powerpc/next sparc/master linus/master sparc-next/master v5.6-rc6 v5.6-rc5 v5.6-rc4] [if your patch is applied to the wrong git

Re: [patch V2 11/15] completion: Use simple wait queues

2020-03-18 Thread Thomas Gleixner
Joel, Joel Fernandes writes: > On Wed, Mar 18, 2020 at 09:43:13PM +0100, Thomas Gleixner wrote: >> The spinlock in the wait queue head cannot be replaced by a raw_spinlock >> because: >> >> - wait queues can have custom wakeup callbacks, which acquire other >> spinlock_t locks and have pot

Re: [patch V2 11/15] completion: Use simple wait queues

2020-03-18 Thread Joel Fernandes
Hi Thomas, On Wed, Mar 18, 2020 at 09:43:13PM +0100, Thomas Gleixner wrote: > From: Thomas Gleixner > > completion uses a wait_queue_head_t to enqueue waiters. > > wait_queue_head_t contains a spinlock_t to protect the list of waiters > which excludes it from being used in truly atomic context

Re: [PATCH v2 1/4] mm: Check for node_online in node_present_pages

2020-03-18 Thread Michael Ellerman
Vlastimil Babka writes: > On 3/18/20 11:02 AM, Michal Hocko wrote: >> On Wed 18-03-20 12:58:07, Srikar Dronamraju wrote: >>> Calling a kmalloc_node on a possible node which is not yet onlined can >>> lead to panic. Currently node_present_pages() doesn't verify the node is >>> online before accessi

Re: [PATCH 4/4] hugetlbfs: clean up command line processing

2020-03-18 Thread Randy Dunlap
Hi Mike, On 3/18/20 3:06 PM, Mike Kravetz wrote: > With all hugetlb page processing done in a single file clean up code. > - Make code match desired semantics > - Update documentation with semantics > - Make all warnings and errors messages start with 'HugeTLB:'. > - Consistently name command li

Re: [PATCH 1/4] hugetlbfs: add arch_hugetlb_valid_size

2020-03-18 Thread Dave Hansen
On 3/18/20 3:52 PM, Mike Kravetz wrote: > Sounds good. I'll incorporate those changes into a v2, unless someone > else with has a different opinion. > > BTW, this patch should not really change the way the code works today. > It is mostly a movement of code. Unless I am missing something, the >

Re: [PATCH 12/15] powerpc/watchpoint: Prepare handler to handle more than one watcnhpoint

2020-03-18 Thread Michael Ellerman
Segher Boessenkool writes: > On Wed, Mar 18, 2020 at 12:44:52PM +0100, Christophe Leroy wrote: >> Le 18/03/2020 à 12:35, Michael Ellerman a écrit : >> >Christophe Leroy writes: >> >>Le 09/03/2020 à 09:58, Ravi Bangoria a écrit : >> >>>Currently we assume that we have only one watchpoint supported

Re: [PATCH 1/4] hugetlbfs: add arch_hugetlb_valid_size

2020-03-18 Thread Mike Kravetz
On 3/18/20 3:15 PM, Dave Hansen wrote: > Hi Mike, > > The series looks like a great idea to me. One nit on the x86 bits, > though... > >> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c >> index 5bfd5aef5378..51e6208fdeec 100644 >> --- a/arch/x86/mm/hugetlbpage.c >> +++ b/arch

Re: [PATCH 1/4] hugetlbfs: add arch_hugetlb_valid_size

2020-03-18 Thread Mike Kravetz
On 3/18/20 3:09 PM, Will Deacon wrote: > On Wed, Mar 18, 2020 at 03:06:31PM -0700, Mike Kravetz wrote: >> The architecture independent routine hugetlb_default_setup sets up >> the default huge pages size. It has no way to verify if the passed >> value is valid, so it accepts it and attempts to val

Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation

2020-03-18 Thread Paul E. McKenney
On Wed, Mar 18, 2020 at 09:43:10PM +0100, Thomas Gleixner wrote: > From: Thomas Gleixner > > The kernel provides a variety of locking primitives. The nesting of these > lock types and the implications of them on RT enabled kernels is nowhere > documented. > > Add initial documentation. > > Sign

Re: [PATCH 2/5] selftests/powerpc: Add header files for NX compresion/decompression

2020-03-18 Thread Daniel Axtens
Raphael Moreira Zinsly writes: > Add files to be able to compress and decompress files using the > powerpc NX-GZIP engine. > > Signed-off-by: Bulent Abali > Signed-off-by: Raphael Moreira Zinsly > --- > .../powerpc/nx-gzip/inc/copy-paste.h | 54 ++ > .../selftests/powerpc/nx-gzip/inc

Re: [patch V2 11/15] completion: Use simple wait queues

2020-03-18 Thread Logan Gunthorpe
On 2020-03-18 2:43 p.m., Thomas Gleixner wrote: > There is no semantical or functional change: > > - completions use the exclusive wait mode which is what swait provides > > - complete() wakes one exclusive waiter > > - complete_all() wakes all waiters while holding the lock which prote

Re: [PATCH 4/5] selftests/powerpc: Add NX-GZIP engine decompress testcase

2020-03-18 Thread Daniel Axtens
Raphael M Zinsly writes: > Thanks for the reviews Daniel, I'll use your testcases and address the > issues you found, I still have some questions bellow: > > On 18/03/2020 03:18, Daniel Axtens wrote: >> Raphael Moreira Zinsly writes: >> >>> Include a decompression testcase for the powerpc NX-G

Re: [PATCH 1/4] hugetlbfs: add arch_hugetlb_valid_size

2020-03-18 Thread Dave Hansen
Hi Mike, The series looks like a great idea to me. One nit on the x86 bits, though... > diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c > index 5bfd5aef5378..51e6208fdeec 100644 > --- a/arch/x86/mm/hugetlbpage.c > +++ b/arch/x86/mm/hugetlbpage.c > @@ -181,16 +181,25 @@ hugetlb

Re: [patch V2 02/15] pci/switchtec: Replace completion wait queue usage for poll

2020-03-18 Thread Logan Gunthorpe
On 2020-03-18 2:43 p.m., Thomas Gleixner wrote: > From: Sebastian Andrzej Siewior > > The poll callback is using the completion wait queue and sticks it into > poll_wait() to wake up pollers after a command has completed. > > This works to some extent, but cannot provide EPOLLEXCLUSIVE suppor

Re: [PATCH 1/4] hugetlbfs: add arch_hugetlb_valid_size

2020-03-18 Thread Will Deacon
On Wed, Mar 18, 2020 at 03:06:31PM -0700, Mike Kravetz wrote: > The architecture independent routine hugetlb_default_setup sets up > the default huge pages size. It has no way to verify if the passed > value is valid, so it accepts it and attempts to validate at a later > time. This requires undo

[PATCH 4/4] hugetlbfs: clean up command line processing

2020-03-18 Thread Mike Kravetz
With all hugetlb page processing done in a single file clean up code. - Make code match desired semantics - Update documentation with semantics - Make all warnings and errors messages start with 'HugeTLB:'. - Consistently name command line parsing routines. - Add comments to code - Describe som

[PATCH 0/4] Clean up hugetlb boot command line processing

2020-03-18 Thread Mike Kravetz
Longpeng(Mike) reported a weird message from hugetlb command line processing and proposed a solution [1]. While the proposed patch does address the specific issue, there are other related issues in command line processing. As hugetlbfs evolved, updates to command line processing have been made to

[PATCH 1/4] hugetlbfs: add arch_hugetlb_valid_size

2020-03-18 Thread Mike Kravetz
The architecture independent routine hugetlb_default_setup sets up the default huge pages size. It has no way to verify if the passed value is valid, so it accepts it and attempts to validate at a later time. This requires undocumented cooperation between the arch specific and arch independent co

[PATCH 2/4] hugetlbfs: move hugepagesz= parsing to arch independent code

2020-03-18 Thread Mike Kravetz
Now that architectures provide arch_hugetlb_valid_size(), parsing of "hugepagesz=" can be done in architecture independent code. Create a single routine to handle hugepagesz= parsing and remove all arch specific routines. We can also remove the interface hugetlb_bad_size() as this is no longer use

[PATCH 3/4] hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate

2020-03-18 Thread Mike Kravetz
The routine hugetlb_add_hstate prints a warning if the hstate already exists. This was originally done as part of kernel command line parsing. If 'hugepagesz=' was specified more than once, the warning pr_warn("hugepagesz= specified twice, ignoring\n"); would be printed. Some architectur

Re: [PATCH 12/15] powerpc/watchpoint: Prepare handler to handle more than one watcnhpoint

2020-03-18 Thread Segher Boessenkool
On Wed, Mar 18, 2020 at 12:44:52PM +0100, Christophe Leroy wrote: > Le 18/03/2020 à 12:35, Michael Ellerman a écrit : > >Christophe Leroy writes: > >>Le 09/03/2020 à 09:58, Ravi Bangoria a écrit : > >>>Currently we assume that we have only one watchpoint supported by hw. > >>>Get rid of that assum

Re: [patch V2 02/15] pci/switchtec: Replace completion wait queue usage for poll

2020-03-18 Thread Bjorn Helgaas
On Wed, Mar 18, 2020 at 09:43:04PM +0100, Thomas Gleixner wrote: > From: Sebastian Andrzej Siewior > > The poll callback is using the completion wait queue and sticks it into > poll_wait() to wake up pollers after a command has completed. > > This works to some extent, but cannot provide EPOLLEX

Re: [patch V2 01/15] PCI/switchtec: Fix init_completion race condition with poll_wait()

2020-03-18 Thread Bjorn Helgaas
On Wed, Mar 18, 2020 at 09:43:03PM +0100, Thomas Gleixner wrote: > From: Logan Gunthorpe > > The call to init_completion() in mrpc_queue_cmd() can theoretically > race with the call to poll_wait() in switchtec_dev_poll(). > > poll() write() > switchtec_dev_poll()

[PATCH AUTOSEL 4.4 07/12] dt-bindings: net: FMan erratum A050385

2020-03-18 Thread Sasha Levin
From: Madalin Bucur [ Upstream commit 26d5bb9e4c4b541c475751e015072eb2cbf70d15 ] FMAN DMA read or writes under heavy traffic load may cause FMAN internal resource leak; thus stopping further packet processing. The FMAN internal queue can overflow when FMAN splits single read or write transactio

[PATCH AUTOSEL 4.9 09/15] dt-bindings: net: FMan erratum A050385

2020-03-18 Thread Sasha Levin
From: Madalin Bucur [ Upstream commit 26d5bb9e4c4b541c475751e015072eb2cbf70d15 ] FMAN DMA read or writes under heavy traffic load may cause FMAN internal resource leak; thus stopping further packet processing. The FMAN internal queue can overflow when FMAN splits single read or write transactio

[patch V2 11/15] completion: Use simple wait queues

2020-03-18 Thread Thomas Gleixner
From: Thomas Gleixner completion uses a wait_queue_head_t to enqueue waiters. wait_queue_head_t contains a spinlock_t to protect the list of waiters which excludes it from being used in truly atomic context on a PREEMPT_RT enabled kernel. The spinlock in the wait queue head cannot be replaced b

[patch V2 10/15] sched/swait: Prepare usage in completions

2020-03-18 Thread Thomas Gleixner
From: Thomas Gleixner As a preparation to use simple wait queues for completions: - Provide swake_up_all_locked() to support complete_all() - Make __prepare_to_swait() public available This is done to enable the usage of complete() within truly atomic contexts on a PREEMPT_RT enabled kernel

[patch V2 08/15] Documentation: Add lock ordering and nesting documentation

2020-03-18 Thread Thomas Gleixner
From: Thomas Gleixner The kernel provides a variety of locking primitives. The nesting of these lock types and the implications of them on RT enabled kernels is nowhere documented. Add initial documentation. Signed-off-by: Thomas Gleixner --- V2: Addressed review comments from Randy --- Docum

[patch V2 03/15] usb: gadget: Use completion interface instead of open coding it

2020-03-18 Thread Thomas Gleixner
ep_io() uses a completion on stack and open codes the waiting with: wait_event_interruptible (done.wait, done.done); and wait_event (done.wait, done.done); This waits in non-exclusive mode for complete(), but there is no reason to do so because the completion can only be waited for by the tas

[patch V2 05/15] acpi: Remove header dependency

2020-03-18 Thread Thomas Gleixner
In order to avoid future header hell, remove the inclusion of proc_fs.h from acpi_bus.h. All it needs is a forward declaration of a struct. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner --- drivers/platform/x86/dell-smo8800.c |1 + drivers/platfor

[patch V2 02/15] pci/switchtec: Replace completion wait queue usage for poll

2020-03-18 Thread Thomas Gleixner
From: Sebastian Andrzej Siewior The poll callback is using the completion wait queue and sticks it into poll_wait() to wake up pollers after a command has completed. This works to some extent, but cannot provide EPOLLEXCLUSIVE support because the waker side uses complete_all() which unconditiona

[patch V2 04/15] orinoco_usb: Use the regular completion interfaces

2020-03-18 Thread Thomas Gleixner
From: Thomas Gleixner The completion usage in this driver is interesting: - it uses a magic complete function which according to the comment was implemented by invoking complete() four times in a row because complete_all() was not exported at that time. - it uses an open coded wait/

[patch V2 09/15] timekeeping: Split jiffies seqlock

2020-03-18 Thread Thomas Gleixner
From: Thomas Gleixner seqlock consists of a sequence counter and a spinlock_t which is used to serialize the writers. spinlock_t is substituted by a "sleeping" spinlock on PREEMPT_RT enabled kernels which breaks the usage in the timekeeping code as the writers are executed in hard interrupt and t

[patch V2 06/15] rcuwait: Add @state argument to rcuwait_wait_event()

2020-03-18 Thread Thomas Gleixner
Extend rcuwait_wait_event() with a state variable so that it is not restricted to UNINTERRUPTIBLE waits. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Oleg Nesterov Cc: Davidlohr Bueso --- include/linux/rcuwait.h | 12 ++-- kernel/locking/percpu-rwse

[patch V2 00/15] Lock ordering documentation and annotation for lockdep

2020-03-18 Thread Thomas Gleixner
This is the second version of this work. The first one can be found here: https://lore.kernel.org/r/20200313174701.148376-1-bige...@linutronix.de Changes since V1: - Split the PCI/switchtec patch (picked up the fix from Logan) and reworked the change log. - Addressed Linus feedback v

[patch V2 07/15] powerpc/ps3: Convert half completion to rcuwait

2020-03-18 Thread Thomas Gleixner
The PS3 notification interrupt and kthread use a hacked up completion to communicate. Since we're wanting to change the completion implementation and this is abuse anyway, replace it with a simple rcuwait since there is only ever the one waiter. AFAICT the kthread uses TASK_INTERRUPTIBLE to not in

[patch V2 01/15] PCI/switchtec: Fix init_completion race condition with poll_wait()

2020-03-18 Thread Thomas Gleixner
From: Logan Gunthorpe The call to init_completion() in mrpc_queue_cmd() can theoretically race with the call to poll_wait() in switchtec_dev_poll(). poll()write() switchtec_dev_poll() switchtec_dev_write() poll_wait(&s->comp.wait); mrpc_queue_cmd

Re: [PATCH] tpm2: Export tpm2_get_cc_attrs_tbl for ibmvtpm driver as module

2020-03-18 Thread Stefan Berger
On 3/18/20 3:42 PM, Jarkko Sakkinen wrote: On Tue, Mar 17, 2020 at 09:08:19AM -0400, Stefan Berger wrote: From: Stefan Berger This patch fixes the following problem when the ibmvtpm driver is built as a module: ERROR: modpost: "tpm2_get_cc_attrs_tbl" [drivers/char/tpm/tpm_ibmvtpm.ko] undefin

Re: [PATCH] tpm2: Export tpm2_get_cc_attrs_tbl for ibmvtpm driver as module

2020-03-18 Thread Jarkko Sakkinen
On Tue, Mar 17, 2020 at 09:08:19AM -0400, Stefan Berger wrote: > From: Stefan Berger > > This patch fixes the following problem when the ibmvtpm driver > is built as a module: > > ERROR: modpost: "tpm2_get_cc_attrs_tbl" [drivers/char/tpm/tpm_ibmvtpm.ko] > undefined! > make[1]: *** [scripts/Make

Re: [PATCH v2 3/4] mm: Implement reset_numa_mem

2020-03-18 Thread Christopher Lameter
On Wed, 18 Mar 2020, Srikar Dronamraju wrote: > For a memoryless or offline nodes, node_numa_mem refers to a N_MEMORY > fallback node. Currently kernel has an API set_numa_mem that sets > node_numa_mem for memoryless node. However this API cannot be used for > offline nodes. Hence all offline node

Re: [PATCH 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline

2020-03-18 Thread Christopher Lameter
On Mon, 16 Mar 2020, Michal Hocko wrote: > > We can dynamically number the nodes right? So just make sure that the > > firmware properly creates memory on node 0? > > Are you suggesting that the OS would renumber NUMA nodes coming > from FW just to satisfy node 0 existence? If yes then I believe t

Re: [PATCH 02/15] powerpc/watchpoint: Add SPRN macros for second DAWR

2020-03-18 Thread Segher Boessenkool
On Tue, Mar 17, 2020 at 11:16:34AM +0100, Christophe Leroy wrote: > > > Le 09/03/2020 à 09:57, Ravi Bangoria a écrit : > >Future Power architecture is introducing second DAWR. Add SPRN_ macros > >for the same. > > I'm not sure this is called 'macros'. For me a macro is something more > complex.

Re: [PATCH 1/3] KVM: PPC: Fix kernel crash with PR KVM

2020-03-18 Thread Sean Christopherson
On Wed, Mar 18, 2020 at 06:43:30PM +0100, Greg Kurz wrote: > It turns out that this is only relevant to PR KVM actually. And both > 32 and 64 backends need vcpu->arch.book3s to be valid when calling > kvmppc_mmu_destroy_pr(). So instead of calling kvmppc_mmu_destroy() > from kvm_arch_vcpu_destroy()

[PATCH 3/3] KVM: PPC: Kill kvmppc_ops::mmu_destroy() and kvmppc_mmu_destroy()

2020-03-18 Thread Greg Kurz
These are only used by HV KVM and BookE, and in both cases they are nops. Signed-off-by: Greg Kurz --- arch/powerpc/include/asm/kvm_ppc.h |2 -- arch/powerpc/kvm/book3s.c |5 - arch/powerpc/kvm/book3s_hv.c |6 -- arch/powerpc/kvm/book3s_pr.c |1 - arc

[PATCH 2/3] KVM: PPC: Move kvmppc_mmu_init() PR KVM

2020-03-18 Thread Greg Kurz
This is only relevant to PR KVM. Make it obvious by moving the function declaration to the Book3s header and rename it with a _pr suffix. Signed-off-by: Greg Kurz --- arch/powerpc/include/asm/kvm_ppc.h|1 - arch/powerpc/kvm/book3s.h |1 + arch/powerpc/kvm/book3s_32_mmu_ho

[PATCH 1/3] KVM: PPC: Fix kernel crash with PR KVM

2020-03-18 Thread Greg Kurz
With PR KVM, shutting down a VM causes the host kernel to crash: [ 314.219284] BUG: Unable to handle kernel data access on read at 0xc0080176c638 [ 314.219299] Faulting instruction address: 0xc00800d4ddb0 cpu 0x0: Vector: 300 (Data Access) at [c0036da077a0] pc: c00800d4ddb0:

[PATCH 0/3] KVM: PPC: Fix host kernel crash with PR KVM

2020-03-18 Thread Greg Kurz
Recent cleanup from Sean Christopherson introduced a use-after-free condition that crashes the kernel when shutting down the VM with PR KVM. It went unnoticed so far because PR isn't tested/used much these days (mostly used for nested on POWER8, not supported on POWER9 where HV should be used for n

Re: [RFC 00/11] perf: Enhancing perf to export processor hazard information

2020-03-18 Thread Kim Phillips
Hi Maddy, On 3/17/20 1:50 AM, maddy wrote: > On 3/13/20 4:08 AM, Kim Phillips wrote: >> On 3/11/20 11:00 AM, Ravi Bangoria wrote: >>> On 3/6/20 3:36 AM, Kim Phillips wrote: > On 3/3/20 3:55 AM, Kim Phillips wrote: >> On 3/2/20 2:21 PM, Stephane Eranian wrote: >>> On Mon, Mar 2, 2020 at

  1   2   >