Re: [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode

2019-01-26 Thread Cédric Le Goater
Was there a crashing.org shutdown ? 

  Received: from gate.crashing.org (gate.crashing.org [63.228.1.57])
by in5.mail.ovh.net (Postfix) with ESMTPS id 43mYnj0nrlz1N7KC
for ; Fri, 25 Jan 2019 22:38:00 + (UTC)
  Received: from localhost (localhost.localdomain [127.0.0.1])
by gate.crashing.org (8.14.1/8.14.1) with ESMTP id x0NLZf4K021092;
Wed, 23 Jan 2019 15:35:43 -0600


On 1/23/19 10:35 PM, Benjamin Herrenschmidt wrote:
> On Wed, 2019-01-23 at 20:07 +0100, Cédric Le Goater wrote:
>>  Event Assignment Structure, a.k.a IVE (Interrupt Virtualization Entry)
>>
>> All the names changed somewhere between XIVE v1 and XIVE v2. OPAL and
>> Linux should be adjusted ...
> 
> All the names changed between the HW design and the "architecture"
> document. The HW guys use the old names, the architecture the new
> names, and Linux & OPAL mostly use the old ones because frankly the new
> names suck big time.

Well, It does not make XIVE any clearer ... I did prefer the v1 names
but there was some naming overlap in the concepts. 

>> It would be good to talk a little about the nested support (offline 
>> may be) to make sure that we are not missing some major interface that 
>> would require a lot of change. If we need to prepare ground, I think
>> the timing is good.
>>
>> The size of the IRQ number space might be a problem. It seems we 
>> would need to increase it considerably to support multiple nested 
>> guests. That said I haven't look much how nested is designed.  
> 
> The size of the VP space is a bigger concern. Even today. We really
> need qemu to tell the max #cpu to KVM so we can allocate less of them.

ah yes. we would also need to reduce the number of available priorities 
per CPU to have more EQ descriptors available if I recall well. 

> As for nesting, I suggest for the foreseeable future we stick to XICS
> emulation in nested guests.

ok. so no kernel_irqchip at all. hmm.  

I was wondering how possible it was to have L2 initialize the underlying 
OPAL structures in the L0 hypervisor. May be with a sort of proxy hcall 
which would perform the initialization in QEMU L1 on behalf of L2.

Cheers,
C.



[PATCH v2] cxl: Wrap iterations over afu slices inside 'afu_list_lock'

2019-01-26 Thread Vaibhav Jain
Within cxl module, iteration over array 'adapter->slices' may be racy
at few points as it might be simultaneously read during an EEH and its
contents being set to NULL while driver is being unloaded or unbound
from the adapter. This might result in a NULL pointer to 'struct afu'
being de-referenced during an EEH thereby causing a kernel oops.

This patch fixes this by making sure that all access to the array
'adapter->slices' is wrapped within the context of spin-lock
'adapter->afu_list_lock'.

Signed-off-by: Vaibhav Jain 
---
Changelog:

v2:
* Fixed a wrong comparison of non-null pointer [Fred]
* Moved a call to cxl_vphb_error_detected() within a branch that
  checks for not null AFU pointer in 'adapter->slices' [Fred]
* Removed a misleading comment in code.
---
 drivers/misc/cxl/guest.c |  2 ++
 drivers/misc/cxl/pci.c   | 43 
 2 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/drivers/misc/cxl/guest.c b/drivers/misc/cxl/guest.c
index 5d28d9e454f5..08f4a512afad 100644
--- a/drivers/misc/cxl/guest.c
+++ b/drivers/misc/cxl/guest.c
@@ -267,6 +267,7 @@ static int guest_reset(struct cxl *adapter)
int i, rc;
 
pr_devel("Adapter reset request\n");
+   spin_lock(&adapter->afu_list_lock);
for (i = 0; i < adapter->slices; i++) {
if ((afu = adapter->afu[i])) {
pci_error_handlers(afu, CXL_ERROR_DETECTED_EVENT,
@@ -283,6 +284,7 @@ static int guest_reset(struct cxl *adapter)
pci_error_handlers(afu, CXL_RESUME_EVENT, 0);
}
}
+   spin_unlock(&adapter->afu_list_lock);
return rc;
 }
 
diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index c79ba1c699ad..ca968a889425 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -1805,7 +1805,7 @@ static pci_ers_result_t cxl_vphb_error_detected(struct 
cxl_afu *afu,
/* There should only be one entry, but go through the list
 * anyway
 */
-   if (afu->phb == NULL)
+   if (afu == NULL || afu->phb == NULL)
return result;
 
list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) {
@@ -1832,7 +1832,8 @@ static pci_ers_result_t cxl_pci_error_detected(struct 
pci_dev *pdev,
 {
struct cxl *adapter = pci_get_drvdata(pdev);
struct cxl_afu *afu;
-   pci_ers_result_t result = PCI_ERS_RESULT_NEED_RESET, afu_result;
+   pci_ers_result_t result = PCI_ERS_RESULT_NEED_RESET;
+   pci_ers_result_t afu_result = PCI_ERS_RESULT_NEED_RESET;
int i;
 
/* At this point, we could still have an interrupt pending.
@@ -1843,6 +1844,7 @@ static pci_ers_result_t cxl_pci_error_detected(struct 
pci_dev *pdev,
 
/* If we're permanently dead, give up. */
if (state == pci_channel_io_perm_failure) {
+   spin_lock(&adapter->afu_list_lock);
for (i = 0; i < adapter->slices; i++) {
afu = adapter->afu[i];
/*
@@ -1851,6 +1853,7 @@ static pci_ers_result_t cxl_pci_error_detected(struct 
pci_dev *pdev,
 */
cxl_vphb_error_detected(afu, state);
}
+   spin_unlock(&adapter->afu_list_lock);
return PCI_ERS_RESULT_DISCONNECT;
}
 
@@ -1932,14 +1935,19 @@ static pci_ers_result_t cxl_pci_error_detected(struct 
pci_dev *pdev,
 * * In slot_reset, free the old resources and allocate new ones.
 * * In resume, clear the flag to allow things to start.
 */
+
+   /* Make sure no one else changes the afu list */
+   spin_lock(&adapter->afu_list_lock);
+
for (i = 0; i < adapter->slices; i++) {
afu = adapter->afu[i];
 
-   afu_result = cxl_vphb_error_detected(afu, state);
-
-   cxl_context_detach_all(afu);
-   cxl_ops->afu_deactivate_mode(afu, afu->current_mode);
-   pci_deconfigure_afu(afu);
+   if (afu != NULL) {
+   afu_result = cxl_vphb_error_detected(afu, state);
+   cxl_context_detach_all(afu);
+   cxl_ops->afu_deactivate_mode(afu, afu->current_mode);
+   pci_deconfigure_afu(afu);
+   }
 
/* Disconnect trumps all, NONE trumps NEED_RESET */
if (afu_result == PCI_ERS_RESULT_DISCONNECT)
@@ -1948,6 +1956,7 @@ static pci_ers_result_t cxl_pci_error_detected(struct 
pci_dev *pdev,
 (result == PCI_ERS_RESULT_NEED_RESET))
result = PCI_ERS_RESULT_NONE;
}
+   spin_unlock(&adapter->afu_list_lock);
 
/* should take the context lock here */
if (cxl_adapter_context_lock(adapter) != 0)
@@ -1980,14 +1989,15 @@ static pci_ers_result_t cxl_pci_slot_reset(struct 
pci_dev *pdev)
 */
cxl_adapter_context_unlock(adapter);
 
+   spin

Re: [PATCH] cxl: Wrap iterations over afu slices inside 'afu_list_lock'

2019-01-26 Thread Vaibhav Jain
Thanks for reviewing this patch Fred,

Frederic Barrat  writes:

>>   
>>  afu_result = cxl_vphb_error_detected(afu, state);
>>   
>> -cxl_context_detach_all(afu);
>> -cxl_ops->afu_deactivate_mode(afu, afu->current_mode);
>> -pci_deconfigure_afu(afu);
>> +if (afu != NULL) {
>> +cxl_context_detach_all(afu);
>> +cxl_ops->afu_deactivate_mode(afu, afu->current_mode);
>> +pci_deconfigure_afu(afu);
>> +}
>
> I can see you're also checking if cxl_vphb_error_detected() is called 
> with a NULL afu from within the function, but why not move the call to 
> cxl_vphb_error_detected() within that "if (afu != NULL)...  " statement? 
> Otherwise, it looks suspicious when reading the code.
Yes, agree. However this was triggerring gcc compile warning
'maybe-uninitialized' for 'afu_result', hence removed the call to
cxl_vphb_error_detected() outside the branch. Have fixed this in v2 with
an explicit initialization of 'afu_result'.

>
>
>> @@ -2051,10 +2067,11 @@ static void cxl_pci_resume(struct pci_dev *pdev)
>>   * This is not the place to be checking if everything came back up
>>   * properly, because there's no return value: do that in slot_reset.
>>   */
>> +spin_lock(&adapter->afu_list_lock);
>>  for (i = 0; i < adapter->slices; i++) {
>>  afu = adapter->afu[i];
>>   
>> -if (afu->phb == NULL)
>> +if (afu || afu->phb == NULL)
>>  continue;
>
>
> if (afu == NULL ...
Thanks for catching this. Have fixed this in v2.
-- 
Vaibhav Jain 
Linux Technology Center, IBM India Pvt. Ltd.



Re: [PATCH v4] kbuild: Add support for DT binding schema checks

2019-01-26 Thread Rob Herring
On Wed, Jan 23, 2019 at 9:33 AM Geert Uytterhoeven  wrote:
>
> Hi Rob,
>
> On Tue, Dec 11, 2018 at 9:24 PM Rob Herring  wrote:
> > This adds the build infrastructure for checking DT binding schema
> > documents and validating dts files using the binding schema.
> >
> > Check DT binding schema documents:
> > make dt_binding_check
> >
> > Build dts files and check using DT binding schema:
> > make dtbs_check
> >
> > Optionally, DT_SCHEMA_FILES can be passed in with a schema file(s) to
> > use for validation. This makes it easier to find and fix errors
> > generated by a specific schema.
> >
> > Currently, the validation targets are separate from a normal build to
> > avoid a hard dependency on the external DT schema project and because
> > there are lots of warnings generated.
>
> Thanks, I'm giving this a try, and get errors like:
>
>   DTC arch/arm/boot/dts/emev2-kzm9d.dt.yaml
> FATAL ERROR: No markers present in property 'cpu0' value
>
> and
>
>   DTC arch/arm64/boot/dts/renesas/r8a7795-salvator-x.dt.yaml
> FATAL ERROR: No markers present in property 'audio_clk_a' value
>
> Do you have a clue?

That's really strange because those aren't even properties. Are other
dts files okay? This is the in tree dtc?

The only time you should be missing markers is if you did a dts -> dts
-> dt.yaml.

Rob


Re: [PATCH v2 01/21] openrisc: prefer memblock APIs returning virtual address

2019-01-26 Thread Stafford Horne
On Mon, Jan 21, 2019 at 10:03:48AM +0200, Mike Rapoport wrote:
> The allocation of the page tables memory in openrics uses
> memblock_phys_alloc() and then converts the returned physical address to
> virtual one. Use memblock_alloc_raw() and add a panic() if the allocation
> fails.
> 
> Signed-off-by: Mike Rapoport 
> ---
>  arch/openrisc/mm/init.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/openrisc/mm/init.c b/arch/openrisc/mm/init.c
> index d157310..caeb418 100644
> --- a/arch/openrisc/mm/init.c
> +++ b/arch/openrisc/mm/init.c
> @@ -105,7 +105,10 @@ static void __init map_ram(void)
>   }
>  
>   /* Alloc one page for holding PTE's... */
> - pte = (pte_t *) __va(memblock_phys_alloc(PAGE_SIZE, 
> PAGE_SIZE));
> + pte = memblock_alloc_raw(PAGE_SIZE, PAGE_SIZE);
> + if (!pte)
> + panic("%s: Failed to allocate page for PTEs\n",
> +   __func__);
>   set_pmd(pme, __pmd(_KERNPG_TABLE + __pa(pte)));
>  
>   /* Fill the newly allocated page with PTE'S */

This seems reasonable to me.

Acked-by: Stafford Horne 



Re: [PATCH v2 1/5] drivers/accel: Introduce subsystem

2019-01-26 Thread Andrew Donnellan
[+ linuxppc-dev, because cxl/ocxl are handled through powerpc - please 
cc on future versions of this series]


On 26/1/19 8:13 am, Olof Johansson wrote:

We're starting to see more of these kind of devices, the current
upcoming wave will likely be around machine learning and inference
engines. A few drivers have been added to drivers/misc for this, but
it's timely to make it into a separate group of drivers/subsystem, to
make it easier to find them, and to encourage collaboration between
contributors.

Over time, we expect to build shared frameworks that the drivers will
make use of, but how that framework needs to look like to fill the needs
is still unclear, and the best way to gain that knowledge is to give the
disparate implementations a shared location.

There has been some controversy around expectations for userspace
stacks being open. The clear preference is to see that happen, and any
driver and platform stack that is delivered like that will be given
preferential treatment, and at some point in the future it might
become the requirement. Until then, the bare minimum we need is an
open low-level userspace such that the driver and HW interfaces can be
exercised if someone is modifying the driver, even if the full details
of the workload are not always available.

Bootstrapping this with myself and Greg as maintainers (since the current
drivers will be moving out of drivers/misc). Looking forward to expanding
that group over time.



[snip]


+
+Hardware offload accelerator subsystem
+==
+
+This is a brief overview of the subsystem (grouping) of hardware
+accelerators kept under drivers/accel
+
+Types of hardware supported
+---
+
+  The general types of hardware supported are hardware devices that has
+  general interactions of sending commands and buffers to the hardware,
+  returning completions and possible filled buffers back, together
+  with the usual driver pieces around hardware control, setup, error
+  handling, etc.
+
+  Drivers that fit into other subsystems are expected to be merged
+  there, and use the appropriate userspace interfaces of said functional
+  areas. We don't expect to see drivers for network, storage, graphics
+  and similar hardware implemented by drivers here.
+
+Expectations for contributions
+--
+
+ - Platforms and hardware that has fully open stacks, from Firmware to
+   Userspace, are always going to be given preferential treatment. These
+   platforms give the best insight for behavior and interaction of all
+   layers, including ability to improve implementation across the stack
+   over time.
+
+ - If a platform is partially proprietary, it is still expected that the
+   portions that interact the driver can be shared in a form that allows
+   for exercising the hardware/driver and evolution of the interface over
+   time. This could be separated into a shared library and test/sample
+   programs, for example.
+
+ - Over time, there is an expectation to converge drivers over to shared
+   frameworks and interfaces. Until then, the general rule is that no
+   more than one driver per vendor will be acceptable. For vendors that
+   aren't participating in the work towards shared frameworks over time,
+   we reserve the right to phase out support for the hardware.
How exactly do generic drivers for interconnect protocols, such as 
cxl/ocxl, fit in here?


cxl and ocxl are not drivers for a specific device, they are generic 
drivers which can be used with any device implementing the CAPI or 
OpenCAPI protocol respectively - many of which will be FPGA boards 
flashed with customer-designed accelerator cores for specific workloads, 
some will be accelerators using ASICs or using FPGA images supplied by 
vendors, some will be driven from userspace, others using the cxl/ocxl 
kernel API, etc.


--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited