Re: [PATCH 1/7] powerpc: Add interface to get msi region information
On Tue, Oct 08, 2013 at 10:47:49AM -0600, Bjorn Helgaas wrote: > I still have no idea what an "aperture type IOMMU" is, > other than that it is "different." An aperture based IOMMU is basically any GART-like IOMMU which can only remap a small window (the aperture) of the DMA address space. DMA outside of that window is either blocked completly or passed through untranslated. Joerg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5 v11] powerpc: Add iommu domain pointer to device archdata
On Thu, Jun 20, 2013 at 02:29:30PM +, Sethi Varun-B16395 wrote: > Hi Joerg, > My PAMU driver patches depend on this patch which was Ack by Kumar. Should I > resubmit this patch as well? Yes, please. Add the collected Acked-bys and submit everything that is missing in v3.10-rc6. Joerg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 v15] iommu/fsl: Add additional iommu attributes required by the PAMU driver.
On Tue, Apr 30, 2013 at 05:09:32PM +, Sethi Varun-B16395 wrote: > Would you take this patchset for 3.10 merge? Not this time. The final patch came in very late and is pretty big too. For code of that size I would like to have a few weeks more testing in next and probably also a non-Freescale Reviewed-by. Joerg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] svm: Add mutex_lock to protect apic_access_page_done on AMD systems
On Mon, Nov 12, 2018 at 12:23:14PM +, Suthikulpanit, Suravee wrote: > From: Wei Wang > > There is a race condition when accessing kvm->arch.apic_access_page_done. > Due to it, x86_set_memory_region will fail when creating the second vcpu > for a svm guest. > > Add a mutex_lock to serialize the accesses to apic_access_page_done. > This lock is also used by vmx for the same purpose. > > Signed-off-by: Wei Wang > Signed-off-by: Amadeusz Juskowiak > Signed-off-by: Julian Stecklina > Signed-off-by: Suravee Suthikulpanit Reviewed-by: Joerg Roedel
Re: [PATCH v3 3/3] x86/mm: add TLB purge to free pmd/pte page interfaces
On Tue, May 29, 2018 at 04:10:24PM +, Kani, Toshi wrote: > Can you explain why you think allocating a page here is a major problem? Because a larger allocation is more likely to fail. And if you fail the allocation, you also fail to free more pages, which _is_ a problem. So better avoid any allocations in code paths that are about freeing memory. > If we just revert, please apply patch 1/3 first. This patch address the > BUG_ON issue on PAE. This is a real issue that needs a fix ASAP. It does not address the problem of dirty page-walk caches on x86-64. > The page-directory cache issue on x64, which is addressed by patch 3/3, > is a theoretical issue that I could not hit by putting ioremap() calls > into a loop for a whole day. Nobody hit this issue, either. How do you know you didn't hit that issue? It might cause silent data corruption, which might not be easily detected. > The simple revert patch Joerg posted a while ago causes > pmd_free_pte_page() to fail on x64. This causes multiple pmd mappings > to fall into pte mappings on my test systems. This can be seen as a > degradation, and I am afraid that it is more harmful than good. The plain revert just removes all the issues with the dirty TLB that the original patch introduced and prevents huge mappings from being established when there have been smaller mappings before. This is not ideal, but at least its is consistent and does not leak pages and leaves no dirty TLBs. So this is the easiest and most reliable fix for this stage in the release process. Regards, Joerg
Re: [v3 00/26] Add VT-d Posted-Interrupts support
Hi Feng, On Tue, Jan 06, 2015 at 01:10:19AM +, Wu, Feng wrote: > Ping... > > Hi Joerg & David, > > Could you please have a look at the IOMMU part of this series (patch 02 - 04, > patch 06 - 09 , patch 26)? > > Hi Thomas, Ingo, & Peter, > > Could you please have a look at this series, especially for patch 01, 05, 21? I fear this conflicts somewhat with the irq-domain patches from Jiang Liu. Once we worked this out (should happen soon) I'll have a look at the VT-d posted interrupt patches. Regards, Joerg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 2/2] x86/mm: implement free pmd/pte page interfaces
On Thu, Apr 26, 2018 at 04:21:19PM +, Kani, Toshi wrote: > All pages under the pmd had been unmapped and then lazy TLB purged with > INVLPG before coming to this code path. Speculation is not allowed to > pages without mapping. CPUs have not only TLBs, but also page-walk caches which cache intermediary results of page-table walks and which is flushed together with the TLB. So the PMD entry you clear can still be in a page-walk cache and this needs to be flushed too before you can free the PTE page. Otherwise page-walks might still go to the page you just freed. That is especially bad when the page is already reallocated and filled with other data. > > Further this needs synchronization with other page-tables in the system > > when the kernel PMDs are not shared between processes. In x86-32 with > > PAE this causes a BUG_ON() being triggered at arch/x86/mm/fault.c:268 > > because the page-tables are not correctly synchronized. > > I think this is an issue with pmd mapping support on x86-32-PAE, not > with this patch. I think the code needed to be updated to sync at the > pud level. It is an issue with this patch, because this patch is for x86 and on x86 every change to the kernel page-tables potentially needs to by synchronized to the other page-tables. And this patch doesn't implement it, which triggers a BUG_ON() under certain conditions. Regards, Joerg
Re: [PATCH v2 2/2] x86/mm: implement free pmd/pte page interfaces
On Thu, Apr 26, 2018 at 05:49:58PM +, Kani, Toshi wrote: > On Thu, 2018-04-26 at 19:23 +0200, j...@8bytes.org wrote: > > So the PMD entry you clear can still be in a page-walk cache and this > > needs to be flushed too before you can free the PTE page. Otherwise > > page-walks might still go to the page you just freed. That is especially > > bad when the page is already reallocated and filled with other data. > > I do not understand why we need to flush processor caches here. x86 > processor caches are coherent with MESI. So, clearing an PMD entry > modifies a cache entry on the processor associated with the address, > which in turn invalidates all stale cache entries on other processors. A page walk cache is not about the processors data cache, its a cache similar to the TLB to speed up page-walks by caching intermediate results of previous page walks. Thanks, Joerg
Re: [PATCH v2 2/2] x86/mm: implement free pmd/pte page interfaces
On Thu, Apr 26, 2018 at 10:30:14PM +, Kani, Toshi wrote: > Thanks for the clarification. After reading through SDM one more time, I > agree that we need a TLB purge here. Here is my current understanding. > > - INVLPG purges both TLB and paging-structure caches. So, PMD cache was > purged once. > - However, processor may cache this PMD entry later in speculation > since it has p-bit set. (This is where my misunderstanding was. > Speculation is not allowed to access a target address, but it may still > cache this PMD entry.) > - A single INVLPG on each processor purges this PMD cache. It does not > need a range purge (which was already done). > > Does it sound right to you? The right fix is to first synchronize the changes when the PMD/PUD is cleared and then flush the TLB system-wide. After that is done you can free the page. But doing all that in the pud/pmd_free_pmd/pte_page() functions is too expensive, as the TLB flush requires to send IPIs to all cores in the system, and that every time the function is called. So what needs to be done is to fix this from high-level ioremap code to first unmap all required PTE/PMD pages and collect them in a list. When that is done you can synchronize the changes with the other page-tables in the system and do one system-wide TLB flush. When that is complete you can free the pages on the list that were collected while unmapping. Then the new mappings can be established and again synchronized with the other page-tables in the system. > As for the BUG_ON issue, are you able to reproduce this issue? If so, > would you be able to test the fix? Yes, I can reproduce the BUG_ON with my PTI patches and a fedora-i386 VM. I already ran into the issue before your patches were merged upstream, but my "fix" is different because it just prevents huge-mappings when there were smaller mappings before. See e3e288121408 x86/pgtable: Don't set huge PUD/PMD on non-leaf entries for details. This patch does not fix the base-problem, but hides it again, as the real fix needs some more work across architectures. Your patch actually makes the problem worse, without it the PTE/PMD pages were just leaked, so that they could not be reused. But with your patch the pages can be used again and the page-walker might establish TLB entries based on random content the new owner writes to it. This can lead to all kinds of random and very hard to debug data corruption issues. So until we make the generic ioremap code in lib/ioremap.c smarter about unmapping/remapping ranges the best solution is making my fix work again by reverting your patch. Thanks, Joerg
Re: [PATCH v2 2/2] x86/mm: implement free pmd/pte page interfaces
On Fri, Apr 27, 2018 at 01:39:23PM +0200, Michal Hocko wrote: > On Fri 27-04-18 09:37:20, j...@8bytes.org wrote: > [...] > > So until we make the generic ioremap code in lib/ioremap.c smarter about > > unmapping/remapping ranges the best solution is making my fix work again > > by reverting your patch. > > Could you reuse the same mmu_gather mechanism as we use in the > zap_*_range? Yeah, maybe, I havn't looked into the details yet. At least something similar is needed. Joerg
Re: [PATCH v2 2/2] x86/mm: implement free pmd/pte page interfaces
On Fri, Apr 27, 2018 at 05:22:28PM +0530, Chintan Pandya wrote: > I'm bit confused here. Are you pointing to race within ioremap/vmalloc > framework while updating the page table or race during tlb ops. Since > later is arch dependent, I would not comment. But if the race being > discussed here while altering page tables, I'm not on the same page. The race condition is between hardware and software. It is not sufficient to just remove the software references to the page that is about to be freed (by clearing the PMD/PUD), also the hardware references in the page-walk cache need to be removed with a TLB flush. Otherwise the hardware can use the freed (and possibly reused) page to establish new TLB entries. Joerg
Re: [PATCH v2 2/2] x86/mm: implement free pmd/pte page interfaces
On Fri, Apr 27, 2018 at 02:31:51PM +, Kani, Toshi wrote: > So, we can add the step 2 on top of this patch. > 1. Clear pud/pmd entry. > 2. System wide TLB flush <-- TO BE ADDED BY NEW PATCH > 3. Free its underlining pmd/pte page. This still lacks the page-table synchronization and will thus not fix the BUG_ON being triggered. > We do not need to revert this patch. We can make the above change I > mentioned. Please note that we are not in the merge window anymore and that any fix needs to be simple and obviously correct. Thanks, Joerg
Re: [PATCH V5 0/3] x86/Hyper-V/IOMMU: Add Hyper-V IOMMU driver to support x2apic mode
On Mon, Feb 25, 2019 at 08:51:22PM +, Michael Kelley wrote: > Joerg -- What's your take on this patch set now that it has settled down? If > you are good with it, from the Microsoft standpoint we're hoping that it > can get into linux-next this week (given the extra week due to 5.0-rc8). I can't find v5 of the patch-set in my inbox or my iommu-list folder. Please re-send to me with all Acks and Reviewed-by's added. Thanks, Joerg
Re: [PATCH v1 1/1] iommu/amd: fix enabling exclusion range for an exact device
On Wed, May 07, 2014 at 01:54:52PM +0800, Su, Friendy wrote: > > From: Su Friendy > > set_device_exclusion_range(u16 devid, struct ivmd_header *m) enables > exclusion range for ONE device. IOMMU does not translate the access > to the exclusion range from the device. > > The device is specified by input argument 'devid'. But 'devid' is not > passed to the actual set function set_dev_entry_bit(), instead > 'm->devid' is passed. 'm->devid' does not specify the exact device > which needs enable the exclusion range. 'm->devid' represents DeviceID > field of IVMD, which has different meaning depends on IVMD type. > > The caller init_exclusion_range() sets 'devid' for ONE device. When > m->type is equal to ACPI_IVMD_TYPE_ALL or ACPI_IVMD_TYPE_RANGE, > 'm->devid' is not equal to 'devid'. > > This patch fixes 'm->devid' to 'devid'. > > Signed-off-by: Su Friendy > Signed-off-by: Tamori Masahiro Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: hpsa driver bug crack kernel down!
Hey David, On Mon, Apr 14, 2014 at 05:03:51PM +, Woodhouse, David wrote: > Jiang, if you can then let me have a copy with a signed-off-by I'll > shepherd it upstream along with your other patch which is already in my > iommu-2.6.git tree. What is the state of these fixes? I plan to send out a pull-request before easter and hoped to include these fixes as well. Thanks, Joerg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: hpsa driver bug crack kernel down!
On Wed, Apr 16, 2014 at 01:58:44PM +, Woodhouse, David wrote: > On Wed, 2014-04-16 at 15:37 +0200, j...@8bytes.org wrote: > > What is the state of these fixes? I plan to send out a pull-request > > before easter and hoped to include these fixes as well. > > I'm travelling and was going to do some final testing and send out a > pull request after I got home tomorrow. But since you ask... > > Please pull from > git://git.infradead.org/iommu-2.6.git > > David Woodhouse (1): > iommu/vt-d: Fix get_domain_for_dev() handling of upstream PCIe bridges > > Jiang Liu (2): > iommu/vt-d: fix memory leakage caused by commit ea8ea46 > iommu/vt-d: fix bug in matching PCI devices with DRHD/RMRR descriptors > > drivers/iommu/dmar.c| 3 ++- > drivers/iommu/intel-iommu.c | 10 +++--- > 2 files changed, 9 insertions(+), 4 deletions(-) Pulled, thanks David. I will also do some additional testing before sending it upstream. Joerg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1 1/1] iommu/amd: set iommu for early mapped ioapic/hpet
On Mon, Sep 01, 2014 at 02:17:44PM +0800, Su, Friendy wrote: > diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c > index 3783e0b..148ab61 100644 > --- a/drivers/iommu/amd_iommu_init.c > +++ b/drivers/iommu/amd_iommu_init.c > @@ -747,7 +747,7 @@ static int __init add_special_device(u8 type, u8 id, u16 > devid, bool cmd_line) > return 0; > } > > -static int __init add_early_maps(void) > +static int __init add_early_maps(struct amd_iommu *iommu) > { > int i, ret; > > @@ -758,6 +758,11 @@ static int __init add_early_maps(void) >early_ioapic_map[i].cmd_line); > if (ret) > return ret; > + /* > + * early mapped ioapci overrides ACPI IVRS, > + * they should be always controlled by iommu. > + */ > + set_iommu_for_device(iommu, early_ioapic_map[i].devid); > } > > for (i = 0; i < early_hpet_map_size; ++i) { > @@ -767,6 +772,11 @@ static int __init add_early_maps(void) >early_hpet_map[i].cmd_line); > if (ret) > return ret; > + /* > + * early mapped hpet overrides ACPI IVRS, > + * they should be always controlled by iommu. > + */ > + set_iommu_for_device(iommu, early_hpet_map[i].devid); This doesn't work on machines with multiple IOMMUs in it. Also the problem only exists if there is no IVHD entry in the IVRS table for the device containing IOAPIC and HPET. But if this IVHD entry is not present we can't reliably know which IOMMU is responsible for a given IOAPIC/HPET. Joerg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix bug in iommu_context_addr: Always get pointer to lower extended-context-table
Hi Nan, On Mon, Aug 24, 2015 at 06:26:33AM +, Xiao, Nan (Nan@HPS Performance, Beijing) wrote: > drivers/iommu/intel-iommu.c | 7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index 0649b94..4213598 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -759,10 +759,11 @@ static inline struct context_entry > *iommu_context_addr(struct intel_iommu *iommu > if (devfn >= 0x80) { > devfn -= 0x80; > entry = &root->hi; > - } > + } else > + entry = &root->lo; > devfn *= 2; > - } > - entry = &root->lo; > + } else > + entry = &root->lo; > if (*entry & 1) > context = phys_to_virt(*entry & VTD_PAGE_MASK); > else { Shouldn't this diff have the same effect? diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 0649b94..7553cb9 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -755,6 +755,7 @@ static inline struct context_entry *iommu_context_addr(struct intel_iommu *iommu struct context_entry *context; u64 *entry; + entry = &root->lo; if (ecs_enabled(iommu)) { if (devfn >= 0x80) { devfn -= 0x80; @@ -762,7 +763,6 @@ static inline struct context_entry *iommu_context_addr(struct intel_iommu *iommu } devfn *= 2; } - entry = &root->lo; if (*entry & 1) context = phys_to_virt(*entry & VTD_PAGE_MASK); else { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix bug in iommu_context_addr: Always get pointer to lower extended-context-table
On Tue, Aug 25, 2015 at 08:46:27AM +, Xiao, Nan (Nan@HPservers-Core-OE-PSC) wrote: > Yes, your patch is simple and better! Okay, I queued this patch to my x86/vt-d branch: >From 03932776424a799c3f065a69e98076a78530288b Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Tue, 25 Aug 2015 10:54:28 +0200 Subject: [PATCH] iommu/vt-d: Really use upper context table when necessary There is a bug in iommu_context_addr() which will always use the lower context table, event when the upper context table needs to be used. Fix this issue. Fixes: 03ecc32c5274 ("iommu/vt-d: support extended root and context entries") Reported-by: Xiao, Nan Signed-off-by: Joerg Roedel --- drivers/iommu/intel-iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index a85077d..63daf1b 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -803,6 +803,7 @@ static inline struct context_entry *iommu_context_addr(struct intel_iommu *iommu struct context_entry *context; u64 *entry; + entry = &root->lo; if (ecs_enabled(iommu)) { if (devfn >= 0x80) { devfn -= 0x80; @@ -810,7 +811,6 @@ static inline struct context_entry *iommu_context_addr(struct intel_iommu *iommu } devfn *= 2; } - entry = &root->lo; if (*entry & 1) context = phys_to_virt(*entry & VTD_PAGE_MASK); else { -- 1.8.4.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix bug in iommu_context_addr: Always get pointer to lower extended-context-table
On Tue, Aug 25, 2015 at 09:15:39AM +, Xiao, Nan (Nan@HPservers-Core-OE-PSC) wrote: > In commit message: > > > There is a bug in iommu_context_addr() which will always use the lower > > context table, event when the upper context table needs to be used. Fix > > this issue. > > I think it should be " even when the upper context table ", not "event". Right, fixed that. Thanks for noticing. Joerg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] iommu: arm-smmu: avoid build warning
On Mon, Feb 02, 2015 at 10:20:49AM +, Will Deacon wrote: > On Fri, Jan 30, 2015 at 09:55:55PM +, Arnd Bergmann wrote: > > - reg = (iova & ~0xfff) >> 32; > > + reg = ((u64)iova & ~0xfff) >> 32; > > writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_HI); > > } > > Thanks, Arnd. > > Acked-by: Will Deacon > > Joerg, could you pick this one up directly please? I don't have any other > ARM SMMU fixes queued at the moment. Done, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] iommu/ipmmu-vmsa: fix device reference leaks
Adding a few more people to Cc. On Sun, Feb 03, 2019 at 10:27:09AM +, wen yang wrote: > Make sure to drop the reference to the device taken by > of_find_device_by_node() on driver unbind. > > Signed-off-by: Wen Yang > Cc: Joerg Roedel > Cc: io...@lists.linux-foundation.org > Cc: linux-kernel@vger.kernel.org > --- > drivers/iommu/ipmmu-vmsa.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c > index 7a4529c..cebf56d 100644 > --- a/drivers/iommu/ipmmu-vmsa.c > +++ b/drivers/iommu/ipmmu-vmsa.c > @@ -756,6 +756,9 @@ static int ipmmu_init_platform_device(struct device *dev, > > fwspec->iommu_priv = platform_get_drvdata(ipmmu_pdev); > > + if (!fwspec->iommu_priv) > + put_device(&ipmmu_pdev->dev); > + > return 0; > } > > -- > 2.7.4
Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain
Hi Suravee, On Thu, Jan 17, 2019 at 08:44:36AM +, Suthikulpanit, Suravee wrote: > Then, in __domain_flush_pages, we issue command when the dev_iommu[] >= 0. > This should preserve previous behavior, and only add flushing condition to > the specific IOMMU in detached state. Please let me know what you think. I think the whole point why VFIO is detaching all devices first and then goes into unmapping pages is that there are no flushes needed anymore when devices are detached. But when we follow your proposal we would still do IOTLB flushes even when no devices are attached anymore. So I'd like to avoid this, given the implications on unmapping performance. We should just flush the IOMMU TLB at detach time and be done with it. Makes sense? Regards, Joerg
Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain
Hi Suravee, On Tue, Jan 22, 2019 at 03:53:18PM +, Suthikulpanit, Suravee wrote: > Thanks for the detail. Alright then, let's just go with the version you > sent on 1/16/19. Do you want me to resend V3 with that changes, or > would you be taking care of that? Please send me a v3 based on the diff I sent last week. Also add a separate patch which adds the missing dte flush for the alias entry. Thanks, Joerg
Re: [PATCH] iommu/amd: Mark translation invalid during device detach
Hi Suravee, On Wed, Jan 16, 2019 at 04:15:10AM +, Suthikulpanit, Suravee wrote: > From: Suravee Suthikulpanit > > When a device switches domain, IOMMU driver detach device from the old > domain, and attach device to the new domain. During this period > the host table root pointer is not set, which means DMA translation > should be marked as invalid (clear TV bit). > > So, clear the TV bit when detach the device. The TV bit will be set > again when attaching device to the new domain. Is there a specific problem with setting the TV bit? Note that the update will clear all other fields in the first 128 bits of the DTE, which means that IR, IW and Mode are all set to 0. This effectivly blocks all DMA requests from the device, which is by design. Regards, Joerg
Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain
On Wed, Jan 16, 2019 at 04:16:25AM +, Suthikulpanit, Suravee wrote: > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c > index 525659b88ade..ab31ba75da1b 100644 > --- a/drivers/iommu/amd_iommu.c > +++ b/drivers/iommu/amd_iommu.c > @@ -1248,7 +1248,13 @@ static void __domain_flush_pages(struct > protection_domain *domain, > build_inv_iommu_pages(&cmd, address, size, domain->id, pde); > > for (i = 0; i < amd_iommu_get_num_iommus(); ++i) { > - if (!domain->dev_iommu[i]) > + /* > + * The dev_cnt is zero when all devices are detached > + * from the domain. This is the case when VFIO detaches > + * all devices from the group before flushing IOMMU pages. > + * So, always issue the flush command. > + */ > + if (domain->dev_cnt && !domain->dev_iommu[i]) > continue; This doesn't look like the correct fix. We still miss the flush if we detach the last device from the domain. How about the attached diff? If I understand the problem correctly, it should fix the problem more reliably. Thanks, Joerg diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 87ba23a75b38..dc1e2a8a19d7 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -1991,25 +1991,36 @@ static void do_attach(struct iommu_dev_data *dev_data, static void do_detach(struct iommu_dev_data *dev_data) { + struct protection_domain *domain = dev_data->domain; struct amd_iommu *iommu; u16 alias; iommu = amd_iommu_rlookup_table[dev_data->devid]; alias = dev_data->alias; - /* decrease reference counters */ - dev_data->domain->dev_iommu[iommu->index] -= 1; - dev_data->domain->dev_cnt -= 1; - /* Update data structures */ dev_data->domain = NULL; list_del(&dev_data->list); - clear_dte_entry(dev_data->devid); - if (alias != dev_data->devid) - clear_dte_entry(alias); + clear_dte_entry(dev_data->devid); /* Flush the DTE entry */ device_flush_dte(dev_data); + + if (alias != dev_data->devid) { + clear_dte_entry(alias); + /* Flush the Alias DTE entry */ + device_flush_dte(alias); + } + + /* Flush IOTLB */ + domain_flush_tlb_pde(domain); + + /* Wait for the flushes to finish */ + domain_flush_complete(domain); + + /* decrease reference counters - needs to happen after the flushes */ + domain->dev_iommu[iommu->index] -= 1; + domain->dev_cnt -= 1; } /*
Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain
On Wed, Jan 16, 2019 at 02:08:55PM +, Suthikulpanit, Suravee wrote: > Actually, I am not sure how we would be missing the flush on the last device. > In my test, I am seeing the flush command being issued correctly during > vfio_unmap_unpin(), which is after all devices are detached. > Although, I might be missing your point here. Could you please elaborate? Okay, you are right, the patch effectivly adds an unconditional flush of the domain on all iommus when the last device is detached. So it is correct in that regard. But that code path is also used in the iommu_unmap() path. The problem now is, that with your change we send flush commands to all IOMMUs in the unmap path when no device is attached to the domain. This will hurt performance there, no? Regards, Joerg
Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain
On Wed, Jan 16, 2019 at 02:40:59PM +, Suthikulpanit, Suravee wrote: > Actually, device_flush_dte(alias) should be needed regardless of this patch. > Are you planning to add this? Yes, I stumbled over this while writing the diff. I'll submit that as a separate patch. Thanks, Joerg
Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain
Hi Suravee, On Thu, Jan 24, 2019 at 03:25:19AM +, Suthikulpanit, Suravee wrote: > Actually, I just noticed that device_flush_dte() has already handled flushing > the DTE > for alias device as well. Please see the link below. > > https://elixir.bootlin.com/linux/latest/source/drivers/iommu/amd_iommu.c#L1219 > > So, we don't need to call device_flush_dte() for alias device in do_detach(). You are right, I missed that. Thanks, Joerg
Re: [PATCH v3] iommu: amd: Fix IOMMU page flush when detach device from a domain
On Thu, Jan 24, 2019 at 04:16:45AM +, Suthikulpanit, Suravee wrote: > drivers/iommu/amd_iommu.c | 15 +++ > 1 file changed, 11 insertions(+), 4 deletions(-) Applied, thanks Suravee.
Re: [PATCH v3] iommu: amd: Fix IOMMU page flush when detach device from a domain
On Thu, Jan 24, 2019 at 02:17:34PM +, Suthikulpanit, Suravee wrote: > On 1/24/19 9:11 PM, j...@8bytes.org wrote: > > On Thu, Jan 24, 2019 at 04:16:45AM +, Suthikulpanit, Suravee wrote: > >> drivers/iommu/amd_iommu.c | 15 +++ > >> 1 file changed, 11 insertions(+), 4 deletions(-) > > > > Applied, thanks Suravee. > > > > Thanks. Also, should this also back-ported to stable tree as well? I added a Fixes tag, so stable should pick it up. Joerg
Re: [PATCH 2/2] iommu/amd: Destroy api_lock mutex when freeing domain
On Thu, Jun 09, 2016 at 03:48:44PM +, Vesely, Jan wrote: > On Sat, 2016-05-21 at 14:11 -0400, Jan Vesely wrote: > > From: Jan Vesely > > > > Signed-off-by: Jan Vesely > > --- > > drivers/iommu/amd_iommu.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c > > index 17c76f2..4ff5e40 100644 > > --- a/drivers/iommu/amd_iommu.c > > +++ b/drivers/iommu/amd_iommu.c > > @@ -3016,6 +3016,7 @@ static void protection_domain_free(struct > > protection_domain *domain) > > > > del_domain_from_list(domain); > > > > + mutex_destroy(&domain->api_lock); > > if (domain->id) > > domain_id_free(domain->id); > > > > ping Your patches lack commit messages, please add a more detailed description of what you are fixing and why. Please also add 'Fixes:' tags when resubmitting. The changes itself look good to me. Joerg
Re: IOMMU/DMA API inquiry
Hi Mark, On Tue, Feb 17, 2015 at 02:48:03PM -0500, Mark Hounschell wrote: > I understand that AMD IOMMU support is not available for 32-bit > kernels. I believe the INTEL IOMMU is supported there. Not knowing > why, I was curious if that is going to remain that way? Yes, I have no plan on making the AMD IOMMU driver available on 32bit. But I would not be resistant to patches enabling the driver there. > I've learned that the AMD IOMMU does not play well with the kernels > "Contiguous Memory Allocator" (CMA). I also believe, but could be > mistaken, that the INTEL IOMMU does. Again, not knowing why, I was > curious if that is going to remain that way also? No, I will queue a patch for the next merge window to enable CMA use in the AMD IOMMU driver. So there will be support for this. > Is the fact that the AMD IOMMU is not supported on 32 bit kernels > the reason that dma_map_page always returns 0 on 32 bit kernels. There is no particular reason for not supporting it on 32 bit kernels, it just didn't seem to be important yet. At least not important enough to justify the work. > I've read the DMA-API-HOWTO.txt concerning dma_map_sg and at first I > thought that maybe dma_map_sg could be used to get around the fact > that AMD IOMMU doesn't work with CMA. But it looks as though I was > mistaken and I would actually have to do a DMA for_each_sg(sglist, > sg, count, i). Is that correct or can dma_map_sg somehow enable you > to do a single DMA using a single address for the entire sglist? The map_sg functions can't be used with the AMD IOMMU driver to work around missing CMA support. Depending on what you want it might work with the Intel IOMMU driver, as this one allocates a single IOVA region for the entire sg_list. Joerg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/