Re: [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode
Was there a crashing.org shutdown ? Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by in5.mail.ovh.net (Postfix) with ESMTPS id 43mYnj0nrlz1N7KC for ; Fri, 25 Jan 2019 22:38:00 + (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id x0NLZf4K021092; Wed, 23 Jan 2019 15:35:43 -0600 On 1/23/19 10:35 PM, Benjamin Herrenschmidt wrote: > On Wed, 2019-01-23 at 20:07 +0100, Cédric Le Goater wrote: >> Event Assignment Structure, a.k.a IVE (Interrupt Virtualization Entry) >> >> All the names changed somewhere between XIVE v1 and XIVE v2. OPAL and >> Linux should be adjusted ... > > All the names changed between the HW design and the "architecture" > document. The HW guys use the old names, the architecture the new > names, and Linux & OPAL mostly use the old ones because frankly the new > names suck big time. Well, It does not make XIVE any clearer ... I did prefer the v1 names but there was some naming overlap in the concepts. >> It would be good to talk a little about the nested support (offline >> may be) to make sure that we are not missing some major interface that >> would require a lot of change. If we need to prepare ground, I think >> the timing is good. >> >> The size of the IRQ number space might be a problem. It seems we >> would need to increase it considerably to support multiple nested >> guests. That said I haven't look much how nested is designed. > > The size of the VP space is a bigger concern. Even today. We really > need qemu to tell the max #cpu to KVM so we can allocate less of them. ah yes. we would also need to reduce the number of available priorities per CPU to have more EQ descriptors available if I recall well. > As for nesting, I suggest for the foreseeable future we stick to XICS > emulation in nested guests. ok. so no kernel_irqchip at all. hmm. I was wondering how possible it was to have L2 initialize the underlying OPAL structures in the L0 hypervisor. May be with a sort of proxy hcall which would perform the initialization in QEMU L1 on behalf of L2. Cheers, C.
[PATCH v2] cxl: Wrap iterations over afu slices inside 'afu_list_lock'
Within cxl module, iteration over array 'adapter->slices' may be racy at few points as it might be simultaneously read during an EEH and its contents being set to NULL while driver is being unloaded or unbound from the adapter. This might result in a NULL pointer to 'struct afu' being de-referenced during an EEH thereby causing a kernel oops. This patch fixes this by making sure that all access to the array 'adapter->slices' is wrapped within the context of spin-lock 'adapter->afu_list_lock'. Signed-off-by: Vaibhav Jain --- Changelog: v2: * Fixed a wrong comparison of non-null pointer [Fred] * Moved a call to cxl_vphb_error_detected() within a branch that checks for not null AFU pointer in 'adapter->slices' [Fred] * Removed a misleading comment in code. --- drivers/misc/cxl/guest.c | 2 ++ drivers/misc/cxl/pci.c | 43 2 files changed, 32 insertions(+), 13 deletions(-) diff --git a/drivers/misc/cxl/guest.c b/drivers/misc/cxl/guest.c index 5d28d9e454f5..08f4a512afad 100644 --- a/drivers/misc/cxl/guest.c +++ b/drivers/misc/cxl/guest.c @@ -267,6 +267,7 @@ static int guest_reset(struct cxl *adapter) int i, rc; pr_devel("Adapter reset request\n"); + spin_lock(&adapter->afu_list_lock); for (i = 0; i < adapter->slices; i++) { if ((afu = adapter->afu[i])) { pci_error_handlers(afu, CXL_ERROR_DETECTED_EVENT, @@ -283,6 +284,7 @@ static int guest_reset(struct cxl *adapter) pci_error_handlers(afu, CXL_RESUME_EVENT, 0); } } + spin_unlock(&adapter->afu_list_lock); return rc; } diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c index c79ba1c699ad..ca968a889425 100644 --- a/drivers/misc/cxl/pci.c +++ b/drivers/misc/cxl/pci.c @@ -1805,7 +1805,7 @@ static pci_ers_result_t cxl_vphb_error_detected(struct cxl_afu *afu, /* There should only be one entry, but go through the list * anyway */ - if (afu->phb == NULL) + if (afu == NULL || afu->phb == NULL) return result; list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) { @@ -1832,7 +1832,8 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev, { struct cxl *adapter = pci_get_drvdata(pdev); struct cxl_afu *afu; - pci_ers_result_t result = PCI_ERS_RESULT_NEED_RESET, afu_result; + pci_ers_result_t result = PCI_ERS_RESULT_NEED_RESET; + pci_ers_result_t afu_result = PCI_ERS_RESULT_NEED_RESET; int i; /* At this point, we could still have an interrupt pending. @@ -1843,6 +1844,7 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev, /* If we're permanently dead, give up. */ if (state == pci_channel_io_perm_failure) { + spin_lock(&adapter->afu_list_lock); for (i = 0; i < adapter->slices; i++) { afu = adapter->afu[i]; /* @@ -1851,6 +1853,7 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev, */ cxl_vphb_error_detected(afu, state); } + spin_unlock(&adapter->afu_list_lock); return PCI_ERS_RESULT_DISCONNECT; } @@ -1932,14 +1935,19 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev, * * In slot_reset, free the old resources and allocate new ones. * * In resume, clear the flag to allow things to start. */ + + /* Make sure no one else changes the afu list */ + spin_lock(&adapter->afu_list_lock); + for (i = 0; i < adapter->slices; i++) { afu = adapter->afu[i]; - afu_result = cxl_vphb_error_detected(afu, state); - - cxl_context_detach_all(afu); - cxl_ops->afu_deactivate_mode(afu, afu->current_mode); - pci_deconfigure_afu(afu); + if (afu != NULL) { + afu_result = cxl_vphb_error_detected(afu, state); + cxl_context_detach_all(afu); + cxl_ops->afu_deactivate_mode(afu, afu->current_mode); + pci_deconfigure_afu(afu); + } /* Disconnect trumps all, NONE trumps NEED_RESET */ if (afu_result == PCI_ERS_RESULT_DISCONNECT) @@ -1948,6 +1956,7 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev, (result == PCI_ERS_RESULT_NEED_RESET)) result = PCI_ERS_RESULT_NONE; } + spin_unlock(&adapter->afu_list_lock); /* should take the context lock here */ if (cxl_adapter_context_lock(adapter) != 0) @@ -1980,14 +1989,15 @@ static pci_ers_result_t cxl_pci_slot_reset(struct pci_dev *pdev) */ cxl_adapter_context_unlock(adapter); + spin
Re: [PATCH] cxl: Wrap iterations over afu slices inside 'afu_list_lock'
Thanks for reviewing this patch Fred, Frederic Barrat writes: >> >> afu_result = cxl_vphb_error_detected(afu, state); >> >> -cxl_context_detach_all(afu); >> -cxl_ops->afu_deactivate_mode(afu, afu->current_mode); >> -pci_deconfigure_afu(afu); >> +if (afu != NULL) { >> +cxl_context_detach_all(afu); >> +cxl_ops->afu_deactivate_mode(afu, afu->current_mode); >> +pci_deconfigure_afu(afu); >> +} > > I can see you're also checking if cxl_vphb_error_detected() is called > with a NULL afu from within the function, but why not move the call to > cxl_vphb_error_detected() within that "if (afu != NULL)... " statement? > Otherwise, it looks suspicious when reading the code. Yes, agree. However this was triggerring gcc compile warning 'maybe-uninitialized' for 'afu_result', hence removed the call to cxl_vphb_error_detected() outside the branch. Have fixed this in v2 with an explicit initialization of 'afu_result'. > > >> @@ -2051,10 +2067,11 @@ static void cxl_pci_resume(struct pci_dev *pdev) >> * This is not the place to be checking if everything came back up >> * properly, because there's no return value: do that in slot_reset. >> */ >> +spin_lock(&adapter->afu_list_lock); >> for (i = 0; i < adapter->slices; i++) { >> afu = adapter->afu[i]; >> >> -if (afu->phb == NULL) >> +if (afu || afu->phb == NULL) >> continue; > > > if (afu == NULL ... Thanks for catching this. Have fixed this in v2. -- Vaibhav Jain Linux Technology Center, IBM India Pvt. Ltd.
Re: [PATCH v4] kbuild: Add support for DT binding schema checks
On Wed, Jan 23, 2019 at 9:33 AM Geert Uytterhoeven wrote: > > Hi Rob, > > On Tue, Dec 11, 2018 at 9:24 PM Rob Herring wrote: > > This adds the build infrastructure for checking DT binding schema > > documents and validating dts files using the binding schema. > > > > Check DT binding schema documents: > > make dt_binding_check > > > > Build dts files and check using DT binding schema: > > make dtbs_check > > > > Optionally, DT_SCHEMA_FILES can be passed in with a schema file(s) to > > use for validation. This makes it easier to find and fix errors > > generated by a specific schema. > > > > Currently, the validation targets are separate from a normal build to > > avoid a hard dependency on the external DT schema project and because > > there are lots of warnings generated. > > Thanks, I'm giving this a try, and get errors like: > > DTC arch/arm/boot/dts/emev2-kzm9d.dt.yaml > FATAL ERROR: No markers present in property 'cpu0' value > > and > > DTC arch/arm64/boot/dts/renesas/r8a7795-salvator-x.dt.yaml > FATAL ERROR: No markers present in property 'audio_clk_a' value > > Do you have a clue? That's really strange because those aren't even properties. Are other dts files okay? This is the in tree dtc? The only time you should be missing markers is if you did a dts -> dts -> dt.yaml. Rob
Re: [PATCH v2 01/21] openrisc: prefer memblock APIs returning virtual address
On Mon, Jan 21, 2019 at 10:03:48AM +0200, Mike Rapoport wrote: > The allocation of the page tables memory in openrics uses > memblock_phys_alloc() and then converts the returned physical address to > virtual one. Use memblock_alloc_raw() and add a panic() if the allocation > fails. > > Signed-off-by: Mike Rapoport > --- > arch/openrisc/mm/init.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/arch/openrisc/mm/init.c b/arch/openrisc/mm/init.c > index d157310..caeb418 100644 > --- a/arch/openrisc/mm/init.c > +++ b/arch/openrisc/mm/init.c > @@ -105,7 +105,10 @@ static void __init map_ram(void) > } > > /* Alloc one page for holding PTE's... */ > - pte = (pte_t *) __va(memblock_phys_alloc(PAGE_SIZE, > PAGE_SIZE)); > + pte = memblock_alloc_raw(PAGE_SIZE, PAGE_SIZE); > + if (!pte) > + panic("%s: Failed to allocate page for PTEs\n", > + __func__); > set_pmd(pme, __pmd(_KERNPG_TABLE + __pa(pte))); > > /* Fill the newly allocated page with PTE'S */ This seems reasonable to me. Acked-by: Stafford Horne
Re: [PATCH v2 1/5] drivers/accel: Introduce subsystem
[+ linuxppc-dev, because cxl/ocxl are handled through powerpc - please cc on future versions of this series] On 26/1/19 8:13 am, Olof Johansson wrote: We're starting to see more of these kind of devices, the current upcoming wave will likely be around machine learning and inference engines. A few drivers have been added to drivers/misc for this, but it's timely to make it into a separate group of drivers/subsystem, to make it easier to find them, and to encourage collaboration between contributors. Over time, we expect to build shared frameworks that the drivers will make use of, but how that framework needs to look like to fill the needs is still unclear, and the best way to gain that knowledge is to give the disparate implementations a shared location. There has been some controversy around expectations for userspace stacks being open. The clear preference is to see that happen, and any driver and platform stack that is delivered like that will be given preferential treatment, and at some point in the future it might become the requirement. Until then, the bare minimum we need is an open low-level userspace such that the driver and HW interfaces can be exercised if someone is modifying the driver, even if the full details of the workload are not always available. Bootstrapping this with myself and Greg as maintainers (since the current drivers will be moving out of drivers/misc). Looking forward to expanding that group over time. [snip] + +Hardware offload accelerator subsystem +== + +This is a brief overview of the subsystem (grouping) of hardware +accelerators kept under drivers/accel + +Types of hardware supported +--- + + The general types of hardware supported are hardware devices that has + general interactions of sending commands and buffers to the hardware, + returning completions and possible filled buffers back, together + with the usual driver pieces around hardware control, setup, error + handling, etc. + + Drivers that fit into other subsystems are expected to be merged + there, and use the appropriate userspace interfaces of said functional + areas. We don't expect to see drivers for network, storage, graphics + and similar hardware implemented by drivers here. + +Expectations for contributions +-- + + - Platforms and hardware that has fully open stacks, from Firmware to + Userspace, are always going to be given preferential treatment. These + platforms give the best insight for behavior and interaction of all + layers, including ability to improve implementation across the stack + over time. + + - If a platform is partially proprietary, it is still expected that the + portions that interact the driver can be shared in a form that allows + for exercising the hardware/driver and evolution of the interface over + time. This could be separated into a shared library and test/sample + programs, for example. + + - Over time, there is an expectation to converge drivers over to shared + frameworks and interfaces. Until then, the general rule is that no + more than one driver per vendor will be acceptable. For vendors that + aren't participating in the work towards shared frameworks over time, + we reserve the right to phase out support for the hardware. How exactly do generic drivers for interconnect protocols, such as cxl/ocxl, fit in here? cxl and ocxl are not drivers for a specific device, they are generic drivers which can be used with any device implementing the CAPI or OpenCAPI protocol respectively - many of which will be FPGA boards flashed with customer-designed accelerator cores for specific workloads, some will be accelerators using ASICs or using FPGA images supplied by vendors, some will be driven from userspace, others using the cxl/ocxl kernel API, etc. -- Andrew Donnellan OzLabs, ADL Canberra andrew.donnel...@au1.ibm.com IBM Australia Limited