Excerpts from Fabiano Rosas's message of April 16, 2021 9:09 am:
> As one of the arguments of the H_ENTER_NESTED hypercall, the nested
> hypervisor (L1) prepares a structure containing the values of various
> hypervisor-privileged registers with which it wants the nested guest
> (L2) to run. Since
Oh sorry, I didn't skim this one before replying to the first.
Excerpts from Fabiano Rosas's message of April 16, 2021 9:09 am:
> Since commit 73937deb4b2d ("KVM: PPC: Book3S HV: Sanitise hv_regs on
> nested guest entry") we have been disabling for the nested guest the
> hypervisor facility bits t
Excerpts from Fabiano Rosas's message of April 16, 2021 9:09 am:
> As one of the arguments of the H_ENTER_NESTED hypercall, the nested
> hypervisor (L1) prepares a structure containing the values of various
> hypervisor-privileged registers with which it wants the nested guest
> (L2) to run. Since
On Tue, Apr 27, 2021 at 1:42 PM Nick Desaulniers
wrote:
>
> On Mon, Apr 26, 2021 at 11:39 PM Christophe Leroy
> wrote:
> >
> > As you can see, CLANG doesn't save/restore 'lr' allthought 'lr' is
> > explicitely listed in the
> > registers clobbered by the inline assembly:
>
> Ah, thanks for debug
Excerpts from Naveen N. Rao's message of April 27, 2021 11:59 pm:
> Nicholas Piggin wrote:
>> ---
>> arch/powerpc/platforms/pseries/lpar.c | 11 +++
>> 1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/lpar.c
>> b/arch/powerpc/platforms/pse
Excerpts from Naveen N. Rao's message of April 27, 2021 11:43 pm:
> Nicholas Piggin wrote:
>> The paravit queued spinlock slow path adds itself to the queue then
>> calls pv_wait to wait for the lock to become free. This is implemented
>> by calling H_CONFER to donate cycles.
>>
>> When hcall trac
On 4/30/21 9:13 AM, Laurent Dufour wrote:
> Le 29/04/2021 à 21:12, Tyrel Datwyler a écrit :
>> On 4/29/21 3:27 AM, Aneesh Kumar K.V wrote:
>>> Laurent Dufour writes:
>>>
After a LPM, the device tree node ibm,dynamic-reconfiguration-memory may be
updated by the hypervisor in the case the
The openat2(2) syscall was added in kernel v5.6 with commit fddb5d430ad9
("open: introduce openat2(2) syscall")
Add the openat2(2) syscall to the audit syscall classifier.
See the github issue
https://github.com/linux-audit/audit-kernel/issues/67
Signed-off-by: Richard Guy Briggs
---
arch/alph
Replace audit syscall class magic numbers with macros.
This required putting the macros into new header file
include/linux/auditscm.h since the syscall macros were included for both 64
bit and 32 bit in any compat code, causing redefinition warnings.
Signed-off-by: Richard Guy Briggs
---
MAINTA
The openat2(2) syscall was added in v5.6. Add support for openat2 to the
audit syscall classifier and for recording openat2 parameters that cannot
be captured in the syscall parameters of the SYSCALL record.
Supporting userspace code can be found in
https://github.com/rgbriggs/audit-userspace/tre
The pull request you sent on Fri, 30 Apr 2021 14:02:32 +1000:
> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git
> tags/powerpc-5.13-1
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/c70a4be130de333ea079c59da41cc959712bb01c
Thank you!
--
Deet-doot-d
On 2021-04-30 13:29, Richard Guy Briggs wrote:
> The openat2(2) syscall was added in v5.6. Add support for openat2 to the
> audit syscall classifier and for recording openat2 parameters that cannot
> be captured in the syscall parameters of the SYSCALL record.
Well, that was a bit premature... C
Signed-off-by: Richard Guy Briggs
---
arch/alpha/kernel/audit.c | 2 ++
arch/ia64/kernel/audit.c | 2 ++
arch/parisc/kernel/audit.c | 2 ++
arch/parisc/kernel/compat_audit.c | 2 ++
arch/powerpc/kernel/audit.c| 2 ++
arch/powerpc/kernel/compat_audit.c | 2 ++
a
Replace the magic numbers used to indicate audit syscall classes with macros.
Signed-off-by: Richard Guy Briggs
---
arch/alpha/kernel/audit.c | 8
arch/ia64/kernel/audit.c | 8
arch/parisc/kernel/audit.c | 8
arch/parisc/kernel/compat_audi
The openat2(2) syscall was added in v5.6. Add support for openat2 to the
audit syscall classifier and for recording openat2 parameters that cannot
be captured in the syscall parameters of the SYSCALL record.
Supporting userspace code can be found in
https://github.com/rgbriggs/audit-userspace/tre
A previous change introduced the usage of DDW as a bigger indirect DMA
mapping when the DDW available size does not map the whole partition.
As most of the code that manipulates direct mappings was reused for
indirect mappings, it's necessary to rename all names and debug/info
messages to reflect
So far it's assumed possible to map the guest RAM 1:1 to the bus, which
works with a small number of devices. SRIOV changes it as the user can
configure hundreds VFs and since phyp preallocates TCEs and does not
allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
per a PE to limit
At the moment pseries stores information about created directly mapped
DDW window in DIRECT64_PROPNAME.
With the objective of implementing indirect DMA mapping with DDW, it's
necessary to have another propriety name to make sure kexec'ing into older
kernels does not break, as it would if we reuse
Update remove_dma_window() so it can be used to remove DDW with a given
property name.
This enables the creation of new property names for DDW, so we can
have different usage for it, like indirect mapping.
Signed-off-by: Leonardo Bras
---
arch/powerpc/platforms/pseries/iommu.c | 21 +++-
Add a new helper _iommu_table_setparms(), and use it in
iommu_table_setparms() and iommu_table_setparms_lpar() to avoid duplicated
code.
Also, setting tbl->it_ops was happening outsite iommu_table_setparms*(),
so move it to the new helper. Since we need the iommu_table_ops to be
declared before us
Code used to create a ddw property that was previously scattered in
enable_ddw() is now gathered in ddw_property_create(), which deals with
allocation and filling the property, letting it ready for
of_property_add(), which now occurs in sequence.
This created an opportunity to reorganize the secon
enable_ddw() currently returns the address of the DMA window, which is
considered invalid if has the value 0x00.
Also, it only considers valid an address returned from find_existing_ddw
if it's not 0x00.
Changing this behavior makes sense, given the users of enable_ddw() only
need to know if dire
There are two functions creating direct_window_list entries in a
similar way, so create a ddw_list_new_entry() to avoid duplicity and
simplify those functions.
Signed-off-by: Leonardo Bras
Reviewed-by: Alexey Kardashevskiy
---
arch/powerpc/platforms/pseries/iommu.c | 32 +---
Creates a helper to allow allocating a new iommu_table without the need
to reallocate the iommu_group.
This will be helpful for replacing the iommu_table for the new DMA window,
after we remove the old one with iommu_tce_table_put().
Signed-off-by: Leonardo Bras
Reviewed-by: Alexey Kardashevskiy
Having a function to check if the iommu table has any allocation helps
deciding if a tbl can be reset for using a new DMA window.
It should be enough to replace all instances of !bitmap_empty(tbl...).
iommu_table_in_use() skips reserved memory, so we don't need to worry about
releasing it before
Some functions assume IOMMU page size can only be 4K (pageshift == 12).
Update them to accept any page size passed, so we can use 64K pages.
In the process, some defines like TCE_SHIFT were made obsolete, and then
removed.
IODA3 Revision 3.0_prd1 (OpenPowerFoundation), Figures 3.4 and 3.5 show
a
So far it's assumed possible to map the guest RAM 1:1 to the bus, which
works with a small number of devices. SRIOV changes it as the user can
configure hundreds VFs and since phyp preallocates TCEs and does not
allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
per a PE to limit
On 4/30/21 10:26, Deucher, Alexander wrote:
> [AMD Public Use]
>
> + Gustavo, amd-gfx
>
>> -Original Message-
>> From: Christian Zigotzky
>> Sent: Friday, April 30, 2021 8:00 AM
>> To: gustavo...@kernel.org; Deucher, Alexander
>>
>> Cc: R.T.Dickinson ; Darren Stevens > zone.net>; ma
Le 29/04/2021 à 21:12, Tyrel Datwyler a écrit :
On 4/29/21 3:27 AM, Aneesh Kumar K.V wrote:
Laurent Dufour writes:
After a LPM, the device tree node ibm,dynamic-reconfiguration-memory may be
updated by the hypervisor in the case the NUMA topology of the LPAR's
memory is updated.
This is caug
On Fri, 2021-04-23 at 19:04 +1000, Alexey Kardashevskiy wrote:
>
> > + win64->name = kstrdup(propname, GFP_KERNEL);
> > + ddwprop = kzalloc(sizeof(*ddwprop), GFP_KERNEL);
> > + win64->value = ddwprop;
> > + win64->length = sizeof(*ddwprop);
> > + if (!win64->name || !win64->value) {
> >
Thanks Alexey!
On Fri, 2021-04-23 at 17:27 +1000, Alexey Kardashevskiy wrote:
>
> On 22/04/2021 17:07, Leonardo Bras wrote:
> > Some functions assume IOMMU page size can only be 4K (pageshift == 12).
> > Update them to accept any page size passed, so we can use 64K pages.
> >
> > In the process,
CC: David Gibson
http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=241574&state=%2A&archive=both
During memory hotunplug, after each LMB is removed, the HPT may be
resized-down if it would map a max of 4 times the current amount of memory.
(2 shifts, due to introduced histeresis)
It usually is not an issue, but it can take a lot of time if HPT
resizing-down fails. This happens because resize
Every time a memory hotplug happens, and the memory limit crosses a 2^n
value, it may be necessary to perform HPT resizing-up, which can take
some time (over 100ms in my tests).
It usually is not an issue, but it can take some time if a lot of memory
is added to a guest with little starting memory
Because hypervisors may need to create HPTs without knowing the guest
page size, the smallest used page-size (4k) may be chosen, resulting in
a HPT that is possibly bigger than needed.
On a guest with bigger page-sizes, the amount of entries for HTP may be
too high, causing the guest to ask for a
This patchset intends to reduce time needed for processing memory
hotplug/hotunplug in hash guests.
The first one, makes sure guests with pagesize over 4k don't need to
go through HPT resize-downs after memory hotplug.
The second and third patches make hotplug / hotunplug perform a single
HPT res
One difference between dlpar_memory_remove_by_count() and
dlpar_memory_remove_by_ic() is that the latter, added in commit
753843471cbb, removes the LMBs in a contiguous block. This was done
because QEMU works with DIMMs, which is nothing more than a set of LMBs,
that must be added or removed togeth
As previously done in dlpar_cpu_remove() for CPUs, this patch changes
dlpar_memory_remove_by_ic() to unisolate the LMB DRC when the LMB is
failed to be removed. The hypervisor, seeing a LMB DRC that was supposed
to be removed being unisolated instead, can do error recovery on its
side.
This change
dlpar_memory_remove_by_ic() validates the amount of LMBs to be removed
by checking !DRCONF_MEM_RESERVED, and in the following loop before
dlpar_remove_lmb() a check for DRCONF_MEM_ASSIGNED is made before
removing it. This means that a LMB that is both !DRCONF_MEM_RESERVED and
!DRCONF_MEM_ASSIGNED w
Hi,
This is a follow-up of the work done in dlpar_cpu_remove() to
report CPU removal error by unisolating the DRC. This time I'm
doing it for LMBs. Patch 01 handles this.
Patches 2 and 3 are cleanups I consider worth posting.
Daniel Henrique Barboza (3):
powerpc/pseries: Set UNISOLATE on dlpa
Le 30/04/2021 à 06:22, Daniel Walker a écrit :
This converts the prom_init string users to the early string function
which don't suffer from KASAN or any other debugging enabled.
Cc: xe-linux-exter...@cisco.com
Signed-off-by: Daniel Walker
---
arch/powerpc/kernel/prom_init.c| 185 +
Le 30/04/2021 à 10:47, Christophe Leroy a écrit :
Le 30/04/2021 à 06:22, Daniel Walker a écrit :
This systems allows some string functions to be moved into
lib/early_string.c and they will be prepended with "early_" and compiled
without debugging like KASAN.
This is already done on x86 for
Le 30/04/2021 à 06:22, Daniel Walker a écrit :
This systems allows some string functions to be moved into
lib/early_string.c and they will be prepended with "early_" and compiled
without debugging like KASAN.
This is already done on x86 for,
"AMD Secure Memory Encryption (SME) support"
and o
Le 30/04/2021 à 09:55, Sandipan Das a écrit :
Trace memory is cleared and the corresponding dcache lines
are flushed after allocation. However, this should not be
done using the PFN. This adds the missing __va() conversion.
Fixes: 2ac02e5ecec0 ("powerpc/mm: Remove dcache flush from memory rem
Le 29/04/2021 à 21:29, Tyrel Datwyler a écrit :
On 4/29/21 11:19 AM, Laurent Dufour wrote:
When a LPAR is migratable, we should consider the maximum possible NUMA
node instead the number of NUMA node from the actual system.
The DT property 'ibm,current-associativity-domains' is defining the max
It really helps to know how the HW is configured when tweaking the IRQ
subsystem.
Signed-off-by: Cédric Le Goater
---
arch/powerpc/sysdev/xics/ics-opal.c | 2 +-
arch/powerpc/sysdev/xics/ics-rtas.c | 3 +++
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/sysdev/xics/ic
Before MSI domains, the default IRQ chip of PHB3 MSIs was patched by
pnv_set_msi_irq_chip() with the custom EOI handler pnv_ioda2_msi_eoi()
and the owning PHB was deduced from the 'ioda.irq_chip' field. This
path has been deprecated by the MSI domains but it is still in use by
the P8 CAPI 'cxl' dri
Passthrough PCI MSI interrupts are detected in KVM with a check on a
specific EOI handler (P8) or on XIVE (P9). We can now check the
PCI-MSI IRQ chip which is cleaner.
Cc: Paul Mackerras
Signed-off-by: Cédric Le Goater
---
arch/powerpc/kvm/book3s_hv.c | 2 +-
arch/powerpc/platforms
The MSI affinity is automanaged and it can be set before starting the
associated IRQ.
( Should we simply remove the irqd_is_started() test ? )
Cc: Thomas Gleixner
Signed-off-by: Cédric Le Goater
---
arch/powerpc/sysdev/xive/common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -
pnv_opal_pci_msi_eoi() is called from KVM to EOI passthrough interrupts
when in real mode. Adding MSI domain broke the hack using the
'ioda.irq_chip' field to deduce the owning PHB. Fix that by using the
IRQ chip data in the MSI domain.
The 'ioda.irq_chip' field is now unused and could be removed
The PowerNV and pSeries platforms now have support for both the XICS
and XIVE IRQ domains.
Signed-off-by: Cédric Le Goater
---
arch/powerpc/platforms/powernv/pci-ioda.c | 4 +---
arch/powerpc/platforms/pseries/msi.c | 4
2 files changed, 1 insertion(+), 7 deletions(-)
diff --git a/arc
XICS doesn't have any state associated with the IRQ. The support is
straightforward and simpler than for XIVE.
Signed-off-by: Cédric Le Goater
---
arch/powerpc/sysdev/xics/xics-common.c | 37 ++
1 file changed, 37 insertions(+)
diff --git a/arch/powerpc/sysdev/xics/xics-
That was a workaround in the XICS domain because of the lack of MSI
domain. This is now handled.
Signed-off-by: Cédric Le Goater
---
arch/powerpc/sysdev/xics/ics-opal.c | 11 ---
arch/powerpc/sysdev/xics/ics-rtas.c | 9 -
2 files changed, 20 deletions(-)
diff --git a/arch/power
MSIs should be fully managed by the PCI and IRQ subsystems now.
Signed-off-by: Cédric Le Goater
---
arch/powerpc/platforms/powernv/pci.h | 6 --
arch/powerpc/platforms/powernv/pci-ioda.c | 29 --
arch/powerpc/platforms/powernv/pci.c | 67 ---
3 files change
The HW IRQ numbers generated by the PCI MSI layer can be quite large
on a pSeries machine when running under the IBM Hypervisor and they
appear as negative. Use '%u' to show them correctly.
Cc: Thomas Gleixner
Signed-off-by: Cédric Le Goater
---
kernel/irq/irqdesc.c | 2 +-
kernel/irq/proc.c
The RTAS firmware can not disable one MSI at a time. It's all or
nothing. We need a custom free IRQ handler for that.
Cc: Thomas Gleixner
Signed-off-by: Cédric Le Goater
---
arch/powerpc/platforms/pseries/msi.c | 16
1 file changed, 16 insertions(+)
diff --git a/arch/powerpc/p
The MSI domain clears the IRQ with msi_domain_free(), which calls
irq_domain_free_irqs_top(), which clears the handler data. This is a
problem for the XIVE controller since we need to unmap MMIO pages and
free a specific XIVE structure.
The 'msi_free()' handler is called before irq_domain_free_irq
PCI MSI interrupt numbers are now mapped in a PCI-MSI domain but the
underlying calls handling the passthrough of the interrupt in the
guest need a number in the XIVE IRQ domain.
Use the IRQ data mapped in the XIVE IRQ domain and not the one in the
PCI-MSI domain.
Exporting irq_get_default_host()
Two IRQ domains are added on top of default machine IRQ domain.
First, the top level "PCI-MSI" domain deals with the MSI specificities.
In this domain, the HW IRQ numbers are generated by the PCI MSI layer,
they compose a unique ID for an MSI source with the PCI device
identifier and the MSI vecto
desc->irq_data points to the top level IRQ data descriptor which is
not necessarily in the XICS IRQ domain. MSIs are in another domain for
instance. Fix that by looking for a mapping on the low level XICS IRQ
domain.
TODO: Why not use irq_migrate_all_off_this_cpu() instead ?
Cc: Thomas Gleixner
PHB3s need an extra OPAL call to EOI the interrupt. The call takes an
OPAL HW IRQ number but it is translated into a vector number in OPAL.
Here, we directly use the vector number of the in-the-middle "MSI"
domain instead of grabbing the OPAL HW IRQ number in the XICS parent
domain.
Signed-off-by:
PCI MSIs now live in an MSI domain but the underlying calls, which
will EOI the interrupt in real mode, need an HW IRQ number mapped in
the XICS IRQ domain. Grab it there.
Cc: Paul Mackerras
Cc: Alexey Kardashevskiy
Signed-off-by: Cédric Le Goater
---
arch/powerpc/kvm/book3s_hv.c | 12
This moves the IRQ initialization done under the OPAL and RTAS backends
in the common part of XICS. The 'map' handler becomes a simple 'check'
on the HW IRQ at the FW level.
As we don't need an ICS anymore in xics_migrate_irqs_away(), the XICS
domain does not set a chip data for the IRQ.
Signed-o
The pnv_ioda2_msi_eoi chip handler is not used anymore for MSIs.
Simply use the check on the PSI-MSI chip.
Cc: Alexey Kardashevskiy
Cc: Paul Mackerras
Signed-off-by: Cédric Le Goater
---
arch/powerpc/platforms/powernv/pci-ioda.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git
MSIs should be fully managed by the PCI and IRQ subsystems now.
Signed-off-by: Cédric Le Goater
---
arch/powerpc/platforms/pseries/msi.c | 87
1 file changed, 87 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/msi.c
b/arch/powerpc/platforms/pseries/msi.c
i
Simply allocate or release the MSI domains when a PHB is inserted in
or removed from the machine.
Signed-off-by: Cédric Le Goater
---
arch/powerpc/platforms/pseries/pseries.h | 1 +
arch/powerpc/platforms/pseries/msi.c | 10 ++
arch/powerpc/platforms/pseries/pci_dlpar.c | 4 +++
and clean up the error path.
Signed-off-by: Cédric Le Goater
---
arch/powerpc/sysdev/xics/xics-common.c | 21 +
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/sysdev/xics/xics-common.c
b/arch/powerpc/sysdev/xics/xics-common.c
index 2fa45cd12a82..
This splits the routine setting the MSIs in two parts: allocation of
MSIs for the PCI device at the FW level (RTAS) and the actual mapping
and activation of the IRQs.
rtas_prepare_msi_irqs() will serve as a handler for the MSI domain.
Signed-off-by: Cédric Le Goater
---
arch/powerpc/platforms/p
It will help to size the PCI MSI domain.
Signed-off-by: Cédric Le Goater
---
arch/powerpc/platforms/pseries/msi.c | 9 +++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/msi.c
b/arch/powerpc/platforms/pseries/msi.c
index 637300330507..d2d090e0
We always had only one ICS per machine. Simplify the XICS driver by
removing the ICS list.
The ICS stored in the chip data of the XICS domain becomes useless and
we don't need it anymore to migrate away IRQs from a CPU. This will be
removed in a subsequent patch.
Signed-off-by: Cédric Le Goater
The routine kvmppc_set_passthru_irq() calls kvmppc_xive_set_mapped()
and kvmppc_xive_clr_mapped() with an IRQ descriptor. Use directly the
host IRQ number to remove a useless conversion.
Add some debug.
Cc: Paul Mackerras
Signed-off-by: Cédric Le Goater
---
arch/powerpc/include/asm/kvm_ppc.h |
This is very similar to the MSI domains of the pSeries platform. The
MSI allocator is directly handled under the Linux PHB in the
in-the-middle "MSI" domain.
Only the XIVE (P9/P10) parent domain is supported for now. We still
need to add support for IRQ domain hierarchy under XICS.
Signed-off-by:
It will be used as a 'compose_msg' handler of the MSI domain
introduced later.
Signed-off-by: Cédric Le Goater
---
arch/powerpc/platforms/powernv/pci-ioda.c | 28 +++
1 file changed, 23 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c
b/a
Hello,
This series adds support for MSI IRQ domains on top of the XICS (P8)
and XIVE (P9/P10) IRQ domains for the PowerNV (baremetal) and pSeries
(VM) platforms. It should improve greatly IRQ affinity of PCI MSIs
under these PowerPC platforms. Data locality can still be improved
with a machine IRQ
pr_debug() is easier to activate and it helps to know how the HW is
configured when tweaking the IRQ subsystem.
Signed-off-by: Cédric Le Goater
---
arch/powerpc/sysdev/xive/common.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c
b/arch/
This adds handlers to allocate/free IRQs in a domain hierarchy. We
could try to use xive_irq_domain_map() in xive_irq_domain_alloc() but
we rely on xive_irq_alloc_data() to set the IRQ handler data and
duplicating the code is simpler.
xive_irq_free_data() needs to be called when IRQ are freed to c
That was a workaround in the XIVE domain because of the lack of MSI
domain. This is now handled.
Signed-off-by: Cédric Le Goater
---
arch/powerpc/sysdev/xive/common.c | 10 --
1 file changed, 10 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c
b/arch/powerpc/sysdev/xive/comm
Sandipan Das writes:
> Trace memory is cleared and the corresponding dcache lines
> are flushed after allocation. However, this should not be
> done using the PFN. This adds the missing __va() conversion.
Reviewed-by: Aneesh Kumar K.V
>
> Fixes: 2ac02e5ecec0 ("powerpc/mm: Remove dcache flush f
Trace memory is cleared and the corresponding dcache lines
are flushed after allocation. However, this should not be
done using the PFN. This adds the missing __va() conversion.
Fixes: 2ac02e5ecec0 ("powerpc/mm: Remove dcache flush from memory remove.")
Signed-off-by: Sandipan Das
---
arch/power
79 matches
Mail list logo