On Mon, 2021-02-22 at 16:24 +1100, Alexey Kardashevskiy wrote: > > On 18/02/2021 06:32, Leonardo Bras wrote: > > On Tue, 2021-02-16 at 14:33 +1100, Alexey Kardashevskiy wrote: > > > Most platforms allocate IOMMU table structures (specifically it_map) > > > at the boot time and when this fails - it is a valid reason for panic(). > > > > > > However the powernv platform allocates it_map after a device is returned > > > to the host OS after being passed through and this happens long after > > > the host OS booted. It is quite possible to trigger the it_map allocation > > > panic() and kill the host even though it is not necessary - the host OS > > > can still use the DMA bypass mode (requires a tiny fraction of it_map's > > > memory) and even if that fails, the host OS is runnnable as it was without > > > the device for which allocating it_map causes the panic. > > > > > > Instead of immediately crashing in a powernv/ioda2 system, this prints > > > an error and continues. All other platforms still call panic(). > > > > > > Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> > > > > Hello Alexey, > > > > This looks like a good change, that passes panic() decision to platform > > code. Everything looks pretty straightforward, but I have a question > > regarding this: > > > > > @@ -1930,16 +1931,16 @@ static long > > > pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe) > > > res_start = pe->phb->ioda.m32_pci_base >> > > > tbl->it_page_shift; > > > res_end = min(window_size, SZ_4G) >> tbl->it_page_shift; > > > } > > > - iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end); > > > - rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl); > > > > > > + if (iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end)) > > > + rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl); > > > + else > > > + rc = -ENOMEM; > > > if (rc) { > > > - pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n", > > > - rc); > > > + pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n", > > > rc); > > > iommu_tce_table_put(tbl); > > > - return rc; > > > + tbl = NULL; /* This clears iommu_table_base below */ > > > } > > > - > > > if (!pnv_iommu_bypass_disabled) > > > pnv_pci_ioda2_set_bypass(pe, true); > > > > > > > > > > > > > > > > > > > If I could understand correctly, previously if iommu_init_table() did > > not panic(), and pnv_pci_ioda2_set_window() returned something other > > than 0, it would return rc in the if (rc) clause, but now it does not > > happen anymore, going through if (!pnv_iommu_bypass_disabled) onwards. > > > > Is that desired? > > > Yes. A PE (==device, pretty much) has 2 DMA windows: > - the default one which requires some RAM to operate > - a bypass mode which tells the hardware that PCI addresses are > statically mapped to RAM 1:1. > > This bypass mode does not require extra memory to work and is used in > the most cases on the bare metal as long as the device supports 64bit > DMA which is everything except GPUs. Since it is cheap to enable and > this what we prefer anyway, no urge to fail. > > > > As far as I could see, returning rc there seems a good procedure after > > iommu_init_table returning -ENOMEM. > > This change is intentional and yes it could be done by a separate patch > but I figured there is no that much value in splitting.
Ok then, thanks for clarifying. FWIW: Reviewed-by: Leonardo Bras <leobra...@gmail.com>