On Wed, Dec 19, 2018 at 09:43:58AM -0700, Alex Williamson wrote:
> [cc +kvm, +lkml]
>
> Sorry, just noticed these are only visible on ppc lists or for those
> directly cc'd. vfio's official development list is the kvm list. I'll
> let spapr specific changes get away without copying this list, bu
This new memory does not have page structs as it is not plugged to
the host so gup() will fail anyway.
This adds 2 helpers:
- mm_iommu_newdev() to preregister the "memory device" memory so
the rest of API can still be used;
- mm_iommu_is_devmem() to know if the physical address is one of thise
new
We might have memory@ nodes with "linux,usable-memory" set to zero
(for example, to replicate powernv's behaviour for GPU coherent memory)
which means that the memory needs an extra initialization but since
it can be used afterwards, the pseries platform will try mapping it
for DMA so the DMA windo
On Thu, Dec 20, 2018 at 05:48:25AM +, Christophe Leroy wrote:
> Some debug setup like CONFIG_KASAN generate huge
> kernels with text size over the 8M limit.
>
> This patch maps a second 8M page when _einittext is over 8M.
Do we also need a check to generate a useful warning if we ever overflo
We already changed NPU API for GPUs to not to call OPAL and the remaining
bit is initializing NPU structures.
This searches for POWER9 NVLinks attached to any device on a PHB and
initializes an NPU structure if any found.
Signed-off-by: Alexey Kardashevskiy
---
Changes:
v5:
* added WARN_ON_ONCE
The pci_dma_bus_setup_pSeries and pci_dma_dev_setup_pSeries hooks are
registered for the pseries platform which does not have FW_FEATURE_LPAR;
these would be pre-powernv platforms which we never supported PCI pass
through for anyway so remove it.
Signed-off-by: Alexey Kardashevskiy
Reviewed-by: D
My bad, I was not cc-ing everyone but now with v7 I am, sorry about that.
This is for passing through NVIDIA V100 GPUs on POWER9 systems.
20/20 has the details of hardware setup.
This implements support for NVIDIA V100 GPU with coherent memory and
NPU/ATS support available in the POWER9 CPU. The
The skiboot firmware has a hot reset handler which fences the NVIDIA V100
GPU RAM on Witherspoons and makes accesses no-op instead of throwing HMIs:
https://github.com/open-power/skiboot/commit/fca2b2b839a67
Now we are going to pass V100 via VFIO which most certainly involves
KVM guests which are
Registering new IOMMU groups and adding devices to them are separated in
code and the latter is dug in the DMA setup code which it does not
really belong to.
This moved IOMMU groups setup to a separate helper which registers a group
and adds devices as before. This does not make a difference as IO
The iommu_table pointer stored in iommu_table_group may get stale
by accident, this adds referencing and removes a redundant comment
about this.
Signed-off-by: Alexey Kardashevskiy
Reviewed-by: David Gibson
---
arch/powerpc/platforms/powernv/pci-ioda-tce.c | 3 ++-
arch/powerpc/platforms/powern
Normally mm_iommu_get() should add a reference and mm_iommu_put() should
remove it. However historically mm_iommu_find() does the referencing and
mm_iommu_get() is doing allocation and referencing.
We are going to add another helper to preregister device memory so
instead of having mm_iommu_new()
Normal PCI PEs have 2 TVEs, one per a DMA window; however NPU PE has only
one which points to one of two tables of the corresponding PCI PE.
So whenever a new DMA window is programmed to PEs, the NPU PE needs to
release old table in order to use the new one.
Commit d41ce7b1bcc3e ("powerpc/powernv
The powernv PCI code stores NPU data in the pnv_phb struct. The latter
is referenced by pci_controller::private_data. We are going to have NPU2
support in the pseries platform as well but it does not store any
private_data in in the pci_controller struct; and even if it did,
it would be a different
At the moment NPU IOMMU is manipulated directly from the IODA2 PCI
PE code; PCI PE acts as a master to NPU PE. Soon we will have compound
IOMMU groups with several PEs from several different PHB (such as
interconnected GPUs and NPUs) so there will be no single master but
a one big IOMMU group.
Thi
When introduced, the NPU context init/destroy helpers called OPAL which
enabled/disabled PID (a userspace memory context ID) filtering in an NPU
per a GPU; this was a requirement for P9 DD1.0. However newer chip
revision added a PID wildcard support so there is no more need to
call OPAL every time
At the moment the powernv platform registers an IOMMU group for each PE.
There is an exception though: an NVLink bridge which is attached to
the corresponding GPU's IOMMU group making it a master.
Now we have POWER9 systems with GPUs connected to each other directly
bypassing PCI. At the moment we
In order to make ATS work and translate addresses for arbitrary
LPID and PID, we need to program an NPU with LPID and allow PID wildcard
matching with a specific MSR mask.
This implements a helper to assign a GPU to LPAR and program the NPU
with a wildcard for PID and a helper to do clean-up. The
A broken device tree might contain more than 8 values and introduce hard
to debug memory corruption bug. This adds the boundary check.
Signed-off-by: Alexey Kardashevskiy
---
arch/powerpc/platforms/powernv/npu-dma.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/po
When a page fault happens in a GPU, the GPU signals the OS and the GPU
driver calls the fault handler which populated a page table; this allows
the GPU to complete an ATS request.
On the bare metal get_user_pages() is enough as it adds a pte to
the kernel page table but under KVM the partition sco
The powernv platform registers IOMMU groups and adds devices to them
from the pci_controller_ops::setup_bridge() hook except one case when
virtual functions (SRIOV VFs) are added from a bus notifier.
The pseries platform registers IOMMU groups from
the pci_controller_ops::dma_bus_setup() hook and
So far we only allowed mapping of MMIO BARs to the userspace. However
there are GPUs with on-board coherent RAM accessible via side
channels which we also want to map to the userspace. The first client
for this is NVIDIA V100 GPU with NVLink2 direct links to a POWER9
NPU-enabled CPU; such GPUs have
POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not
pluggable PCIe devices but still have PCIe links which are used
for config space and MMIO. In addition to that the GPUs have 6 NVLinks
which are connected to other GPUs and the POWER9 CPU. POWER9 chips
have a special unit on a die
VFIO regions already support region capabilities with a limited set of
fields. However the subdriver might have to report to the userspace
additional bits.
This adds an add_capability() hook to vfio_pci_regops.
Signed-off-by: Alexey Kardashevskiy
Acked-by: Alex Williamson
---
Changes:
v3:
* rem
Le 20/12/2018 à 09:24, Christoph Hellwig a écrit :
On Thu, Dec 20, 2018 at 05:48:25AM +, Christophe Leroy wrote:
Some debug setup like CONFIG_KASAN generate huge
kernels with text size over the 8M limit.
This patch maps a second 8M page when _einittext is over 8M.
Do we also need a che
Alexey Kardashevskiy writes:
> My bad, I was not cc-ing everyone but now with v7 I am, sorry about that.
I've already applied v6, I'll assume this is unchanged from that unless
you tell me otherwise.
cheers
> This is for passing through NVIDIA V100 GPUs on POWER9 systems.
> 20/20 has the detai
This bug is original reported at
https://lore.kernel.org/patchwork/patch/1020838/
In a short word, this bug should affect all archs, where a machine with a
numa-node having no memory, if nr_cpus prevents the instance of nodeA, and the
device on nodeA tries to allocate memory with device->numa_node
The current build_zonelist_xx func relies on pgdat instance to build
zonelist, if a numa node is offline, there will no pgdat instance for it.
But in some case, there is still requirement for zonelist of offline node,
especially with nr_cpus option.
This patch change these funcs topo to ease the bu
I hit a bug on an AMD machine, with kexec -l nr_cpus=4 option. It is due to
some pgdat is not instanced when specifying nr_cpus, e.g, on x86, not
initialized by init_cpu_to_node()->init_memory_less_node(). But
device->numa_node info is used as preferred_nid param for
__alloc_pages_nodemask(), which
This patch tries to resolve a bug rooted at mm when using nr_cpus. It was
reported at [1]. The root cause is: device->numa_node info is used as
preferred_nid param for __alloc_pages_nodemask(), which causes NULL
reference when ac->zonelist = node_zonelist(preferred_nid, gfp_mask), due to
the prefer
Joel Stanley writes:
> Building the ppc64 kernel with a modern binutils results in this
> warning:
>
> powerpc64le-linux-gnu-ld: warning: orphan section `.gnu.hash' from
> `linker stubs' being placed in section `.gnu.hash'
>
> Alan Modra explains:
>
> > .gnu.hash, like .hash, is used by glibc
Joel Stanley writes:
> Alan Modra explains:
>
> > Likely you could discard .interp > and .dynstr too, and .dynsym when
> > !CONFIG_PPC32.
>
> Discarding of interp and dynstr happened in a previous patch. The dynsym
> cleanup was a bit less straightforward, so it gets it's own patch.
>
> Signed
On 20/12/2018 20:38, Michael Ellerman wrote:
> Alexey Kardashevskiy writes:
>
>> My bad, I was not cc-ing everyone but now with v7 I am, sorry about that.
>
> I've already applied v6, I'll assume this is unchanged from that unless
> you tell me otherwise.
14/20 has fixed warning about uninit
On Thu 20-12-18 17:50:38, Pingfan Liu wrote:
[...]
> @@ -453,7 +456,12 @@ static inline int gfp_zonelist(gfp_t flags)
> */
> static inline struct zonelist *node_zonelist(int nid, gfp_t flags)
> {
> - return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags);
> + if (unlikely(!possible
On Thu, Dec 20, 2018 at 7:35 PM Michal Hocko wrote:
>
> On Thu 20-12-18 17:50:38, Pingfan Liu wrote:
> [...]
> > @@ -453,7 +456,12 @@ static inline int gfp_zonelist(gfp_t flags)
> > */
> > static inline struct zonelist *node_zonelist(int nid, gfp_t flags)
> > {
> > - return NODE_DATA(nid)-
On Thu 20-12-18 20:26:28, Pingfan Liu wrote:
> On Thu, Dec 20, 2018 at 7:35 PM Michal Hocko wrote:
> >
> > On Thu 20-12-18 17:50:38, Pingfan Liu wrote:
> > [...]
> > > @@ -453,7 +456,12 @@ static inline int gfp_zonelist(gfp_t flags)
> > > */
> > > static inline struct zonelist *node_zonelist(in
Breno Leitao writes:
> A new self test that forces MSR[TS] to be set without calling any TM
> instruction. This test also tries to cause a page fault at a signal
> handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing
> thread->texasr to be rewritten with TEXASR[FS] = 0, which will
Hi Ram,
Thanks for fixing this.
Ram Pai writes:
> diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
> index b271b28..5d65c47 100644
> --- a/arch/powerpc/mm/pkeys.c
> +++ b/arch/powerpc/mm/pkeys.c
> @@ -414,3 +414,10 @@ bool arch_vma_access_permitted(struct vm_area_struct
> *vma, bo
On 30.11.18 18:59, David Hildenbrand wrote:
> This is the second approach, introducing more meaningful memory block
> types and not changing online behavior in the kernel. It is based on
> latest linux-next.
>
> As we found out during dicussion, user space should always handle onlining
> of memory
On Thu 20-12-18 13:58:16, David Hildenbrand wrote:
> On 30.11.18 18:59, David Hildenbrand wrote:
> > This is the second approach, introducing more meaningful memory block
> > types and not changing online behavior in the kernel. It is based on
> > latest linux-next.
> >
> > As we found out during
On 20.12.18 14:08, Michal Hocko wrote:
> On Thu 20-12-18 13:58:16, David Hildenbrand wrote:
>> On 30.11.18 18:59, David Hildenbrand wrote:
>>> This is the second approach, introducing more meaningful memory block
>>> types and not changing online behavior in the kernel. It is based on
>>> latest li
On Mon, 17 Dec 2018 11:38:51 +1100
"Alastair D'Silva" wrote:
> On Sun, 2018-12-16 at 22:28 +0100, Greg Kurz wrote:
> > All fields in the PE are big-endian. Use cpu_to_be32() like
> > everywhere
> > else something is written to the PE. Otherwise a wrong TID will be
> > used
> > by the NPU. If this
On Wed, 12 Dec 2018 13:26:10 +1100
Andrew Donnellan wrote:
> On 12/12/18 4:58 am, Greg Kurz wrote:
> > The AFU Descriptor Template in the PCI config space has a Name Space
> > field which is a 24 Byte ASCII character string of descriptive name
> > space for the AFU. The OCXL driver read the strin
On Thu, Dec 20, 2018 at 07:23:50PM +1100, Alexey Kardashevskiy wrote:
> POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not
> pluggable PCIe devices but still have PCIe links which are used
> for config space and MMIO. In addition to that the GPUs have 6 NVLinks
> which are connect
On 12/20/2018 05:40 AM, Benjamin Herrenschmidt wrote:
Hi folks !
Why trying to figure out why we had occasionally lockdep barf about
interrupt state on ppc32 (440 in my case but I could reproduce on e500
as well using qemu), I realized that we are still doing something
rather gothic and wrong
On Thu, 20 Dec 2018 19:23:50 +1100
Alexey Kardashevskiy wrote:
> POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not
> pluggable PCIe devices but still have PCIe links which are used
> for config space and MMIO. In addition to that the GPUs have 6 NVLinks
> which are connected to
On Tue, 11 Dec 2018 11:09:39 +1100
Andrew Donnellan wrote:
> Acked-by: Andrew Donnellan
>
Friendly ping before Xmas break :)
> On 11/12/18 2:13 am, Greg Kurz wrote:
> > The AFU irq code doesn't need to reach out to the platform.
> >
> > Signed-off-by: Greg Kurz
> > ---
> > drivers/misc/oc
On Tue, 11 Dec 2018 11:19:55 +1100
Andrew Donnellan wrote:
> On 11/12/18 2:18 am, Greg Kurz wrote:
> > Implementing rollback with goto and labels is a common practice that
> > leads to prettier and more maintainable code. FWIW, this design pattern
> > is already being used in alloc_link() a few l
Firmware-Assisted Dump (FADump) is currently supported only on pseries
platform. This patch series adds support for powernv platform too.
The first and third patches refactor the FADump code to make use of common
code across multiple platforms. The fourth patch adds basic FADump support
to powernv
Refactoring fadump code means internal fadump code is referenced from
different places. For ease, move internal code to a new file.
Signed-off-by: Hari Bathini
---
arch/powerpc/include/asm/fadump.h | 112 ---
arch/powerpc/kernel/Makefile |2
arch/powerpc/kernel
The figures depicting FADump's (Firmware-Assisted Dump) memory layout
are missing some finer details like different memory regions and what
they represent. Improve the documentation by updating those details.
Signed-off-by: Hari Bathini
---
Documentation/powerpc/firmware-assisted-dump.txt | 56
Introduce callbacks for platform specific operations like register,
unregister, invalidate & such, and move pseries specific code into
platform code.
Signed-off-by: Hari Bathini
---
arch/powerpc/include/asm/fadump.h | 71 ---
arch/powerpc/kernel/fadump.c| 502
From: Hari Bathini
Firmware-assisted dump support is enabled for OPAL based POWER platforms
in P9 firmware. Make the corresponding updates in kernel to enable fadump
support for such platforms.
Signed-off-by: Hari Bathini
---
arch/powerpc/Kconfig|5
arch/powerp
From: Hari Bathini
Firmware provides architected register state data at the time of crash.
This data contains PIR value. Need to store the logical CPUs PIR values
to match the data provided by f/w with the corresponding logical CPU.
Signed-off-by: Hari Bathini
Signed-off-by: Vasant Hegde
---
From: Hari Bathini
Export /proc/opalcore file to analyze opal crashes
Signed-off-by: Hari Bathini
---
arch/powerpc/platforms/powernv/Makefile |2
arch/powerpc/platforms/powernv/opal-core.c | 385 ++
arch/powerpc/platforms/powernv/opal-core.h | 35 ++
ar
Add a new kernel config option, CONFIG_PRESERVE_FA_DUMP that ensures
that crash data, from previously crash'ed kernel, is preserved. This
helps in cases where FADUMP is not enabled but the subsequent memory
preserving kernel boot is likely to process this crash data. One
typical usecase for this co
Signed-off-by: Hari Bathini
---
Documentation/powerpc/firmware-assisted-dump.txt | 56 +++---
1 file changed, 28 insertions(+), 28 deletions(-)
diff --git a/Documentation/powerpc/firmware-assisted-dump.txt
b/Documentation/powerpc/firmware-assisted-dump.txt
index 4897665..326f8
With FADump support now available on both pseries and OPAL platforms,
update FADump documentation with these details. Also, update about
backup area and why it is used.
Signed-off-by: Hari Bathini
---
Documentation/powerpc/firmware-assisted-dump.txt | 102 ++
1 file changed,
On Fri, Dec 21, 2018 at 12:19:13AM +1100, Michael Ellerman wrote:
> Hi Ram,
>
> Thanks for fixing this.
>
> Ram Pai writes:
> > diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
> > index b271b28..5d65c47 100644
> > --- a/arch/powerpc/mm/pkeys.c
> > +++ b/arch/powerpc/mm/pkeys.c
> >
Pkey tracking information is not copied over to the mm_struct of the
child during fork(). This can cause the child to erroneously allocate
keys that were already allocated. Any allocated execute-only key is lost
aswell.
Add code; called by dup_mmap(), to copy the pkey state from parent to
chil
Hi Sebastian,
On Tue, Dec 18, 2018 at 11:16:49AM +0100, Sebastian Ott wrote:
> Provide a flag to skip scanning for new VFs after SRIOV enablement.
> This can be set by implementations for which the VFs are already
> reported by other means.
>
> Signed-off-by: Sebastian Ott
> ---
> drivers/pci/i
> > /*
> >* MSR_KERNEL is > 0x1 on 4xx/Book-E since it include MSR_CE.
> > @@ -205,20 +208,46 @@ transfer_to_handler_cont:
> > mflrr9
> > lwz r11,0(r9) /* virtual address of handler */
> > lwz r9,4(r9)/* where to go when done */
> >
Murilo Opsfelder Araujo writes:
> On Thu, Dec 20, 2018 at 07:23:50PM +1100, Alexey Kardashevskiy wrote:
...
>> diff --git a/drivers/vfio/pci/trace.h b/drivers/vfio/pci/trace.h
>> new file mode 100644
>> index 000..b80d2d3
>> --- /dev/null
>> +++ b/drivers/vfio/pci/trace.h
...
>> +TRACE_EVENT(v
Ram Pai writes:
> Pkey tracking information is not copied over to the mm_struct of the
> child during fork(). This can cause the child to erroneously allocate
> keys that were already allocated. Any allocated execute-only key is lost
> aswell.
>
> Add code; called by dup_mmap(), to copy the
Hi Steven !
I'm trying to untangle something, and I need your help :-)
In commit 3cb5f1a3e58c0bd70d47d9907cc5c65192281dee, you added a summy
stack frame around the assembly calls to trace_hardirqs_on/off on the
ground that when using the latency tracer (irqsoff), you might poke at
CALLER_ADDR1 an
On 21/12/2018 03:46, Alex Williamson wrote:
> On Thu, 20 Dec 2018 19:23:50 +1100
> Alexey Kardashevskiy wrote:
>
>> POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not
>> pluggable PCIe devices but still have PCIe links which are used
>> for config space and MMIO. In addition
On Fri, 21 Dec 2018 12:23:16 +1100
Alexey Kardashevskiy wrote:
> On 21/12/2018 03:46, Alex Williamson wrote:
> > On Thu, 20 Dec 2018 19:23:50 +1100
> > Alexey Kardashevskiy wrote:
> >
> >> POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not
> >> pluggable PCIe devices but sti
On 21/12/2018 12:37, Alex Williamson wrote:
> On Fri, 21 Dec 2018 12:23:16 +1100
> Alexey Kardashevskiy wrote:
>
>> On 21/12/2018 03:46, Alex Williamson wrote:
>>> On Thu, 20 Dec 2018 19:23:50 +1100
>>> Alexey Kardashevskiy wrote:
>>>
POWER9 Witherspoon machines come with 4 or 6 V100
On Fri, 21 Dec 2018 12:11:35 +1100
Benjamin Herrenschmidt wrote:
> Hi Steven !
>
> I'm trying to untangle something, and I need your help :-)
>
> In commit 3cb5f1a3e58c0bd70d47d9907cc5c65192281dee, you added a summy
> stack frame around the assembly calls to trace_hardirqs_on/off on the
> groun
On Fri, 21 Dec 2018 12:50:00 +1100
Alexey Kardashevskiy wrote:
> On 21/12/2018 12:37, Alex Williamson wrote:
> > On Fri, 21 Dec 2018 12:23:16 +1100
> > Alexey Kardashevskiy wrote:
> >
> >> On 21/12/2018 03:46, Alex Williamson wrote:
> >>> On Thu, 20 Dec 2018 19:23:50 +1100
> >>> Alexey Kard
This reverts commit 6d11023c345e369bcb9d5a68b271764e362c1f6e ("serial:
8250: Default SERIAL_OF_PLATFORM to SERIAL_8250") since that breaks at
least mpc8544ds (PowerPC) using arch/powerpc/kernel/legacy_serial.c.
See https://lkml.org/lkml/2018/12/5/1491 for discussion and analysis
Fixes: 6d11023c34
;>> defined where it was not previously. Example mpc85xx_defconfig. This in
>>> turn results in boot failures for those configurations, with an error
>>> message of
>>>
>>> of_serial: probe of e0004500.serial failed with error -22
>>>
>>> wh
Christophe Leroy writes:
> This patch implements a framework for Kernel Userspace Access
> Protection.
>
> Then subarches will have to possibility to provide their own
> implementation by providing setup_kuap() and lock/unlock_user_access()
>
> Some platform will need to know the area accessed an
Hi all,
Today's linux-next merge of the kvm tree got a conflict in:
arch/powerpc/mm/fault.c
between commit:
49a502ea23bf ("powerpc/mm: Make NULL pointer deferences explicit on bad page
faults.")
from the powerpc tree and commit:
d7b456152230 ("KVM: PPC: Book3S HV: Implement functions t
On Thu, Dec 20, 2018 at 8:44 PM Michal Hocko wrote:
>
> On Thu 20-12-18 20:26:28, Pingfan Liu wrote:
> > On Thu, Dec 20, 2018 at 7:35 PM Michal Hocko wrote:
> > >
> > > On Thu 20-12-18 17:50:38, Pingfan Liu wrote:
> > > [...]
> > > > @@ -453,7 +456,12 @@ static inline int gfp_zonelist(gfp_t flags
Le 21/12/2018 à 06:07, Michael Ellerman a écrit :
Christophe Leroy writes:
This patch implements a framework for Kernel Userspace Access
Protection.
Then subarches will have to possibility to provide their own
implementation by providing setup_kuap() and lock/unlock_user_access()
Some pla
Hi Pingfan,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on linus/master]
[also build test ERROR on v4.20-rc7 next-20181220]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system]
url:
https://github.com/0day-ci/linux
76 matches
Mail list logo