On 18/02/2021 06:32, Leonardo Bras wrote:
On Tue, 2021-02-16 at 14:33 +1100, Alexey Kardashevskiy wrote:
Most platforms allocate IOMMU table structures (specifically it_map)
at the boot time and when this fails - it is a valid reason for panic().
However the powernv platform allocates it_map after a device is returned
to the host OS after being passed through and this happens long after
the host OS booted. It is quite possible to trigger the it_map allocation
panic() and kill the host even though it is not necessary - the host OS
can still use the DMA bypass mode (requires a tiny fraction of it_map's
memory) and even if that fails, the host OS is runnnable as it was without
the device for which allocating it_map causes the panic.
Instead of immediately crashing in a powernv/ioda2 system, this prints
an error and continues. All other platforms still call panic().
Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>
Hello Alexey,
This looks like a good change, that passes panic() decision to platform
code. Everything looks pretty straightforward, but I have a question
regarding this:
@@ -1930,16 +1931,16 @@ static long pnv_pci_ioda2_setup_default_config(struct
pnv_ioda_pe *pe)
res_start = pe->phb->ioda.m32_pci_base >> tbl->it_page_shift;
res_end = min(window_size, SZ_4G) >> tbl->it_page_shift;
}
- iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end);
- rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
+ if (iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end))
+ rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
+ else
+ rc = -ENOMEM;
if (rc) {
- pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n",
- rc);
+ pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n",
rc);
iommu_tce_table_put(tbl);
- return rc;
+ tbl = NULL; /* This clears iommu_table_base below */
}
-
if (!pnv_iommu_bypass_disabled)
pnv_pci_ioda2_set_bypass(pe, true);
If I could understand correctly, previously if iommu_init_table() did
not panic(), and pnv_pci_ioda2_set_window() returned something other
than 0, it would return rc in the if (rc) clause, but now it does not
happen anymore, going through if (!pnv_iommu_bypass_disabled) onwards.
Is that desired?
Yes. A PE (==device, pretty much) has 2 DMA windows:
- the default one which requires some RAM to operate
- a bypass mode which tells the hardware that PCI addresses are
statically mapped to RAM 1:1.
This bypass mode does not require extra memory to work and is used in
the most cases on the bare metal as long as the device supports 64bit
DMA which is everything except GPUs. Since it is cheap to enable and
this what we prefer anyway, no urge to fail.
As far as I could see, returning rc there seems a good procedure after
iommu_init_table returning -ENOMEM.
This change is intentional and yes it could be done by a separate patch
but I figured there is no that much value in splitting.
--
Alexey