IO_PAGE_FAULTS when enabling IOMMU in coreboot for ASUS F2A85-M
Dear Linux folks, Rudolf Marek pushed a patch for review to enable the IOMMU in coreboot for the ASUS F2A85-M [1]. With this patch applied, Linux, I think 3.10-rc1, shows IO_PAGE_FAULT messages. $ dmesg […] [0.00] ACPI: IVRS bf144e10 00070 (v02 AMD AMDIOMMU 0001 AMD ) [0.00] ACPI: SSDT bf144e80 0051F (v02AMD ALIB 0001 MSFT 0400) [0.00] ACPI: SSDT bf1453a0 006B2 (v01 AMDPOWERNOW 0001 AMD 0001) [0.00] ACPI: SSDT bf145a52 00045 (v02 CORE COREBOOT 002A CORE 002A) […] [0.465114] [Firmware Bug]: ACPI: no secondary bus range in _CRS […] [0.567330] pci :00:00.0: >[1022:1410] type 00 class 0x06 [0.567364] pci :00:00.2: >[1022:1419] type 00 class 0x080600 [0.567427] pci :00:01.0: >[1002:9993] type 00 class 0x03000 […] [0.597731] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] [0.597899] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PIBR._PRT] [0.597933] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.SBR0._PRT] [0.597972] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.SBR1._PRT] [0.598073] pci:00: >Requesting ACPI _OSC control (0x1d) [0.603808] pci:00: >ACPI _OSC request failed (AE_NOT_FOUND), returned control mask: 0x1d [0.612397] ACPI _OSC control for PCIe not granted, disabling ASPM [0.620508] Freeing initrd memory: 14876k freed […] [0.882674] pci :00:01.0: >Boot video device [0.882876] PCI: CLS 64 bytes, default 64 [0.897088] AMD-Vi: Enabling IOMMU at :00:00.2 cap 0x40 extended features: PreF PPR GT IA [0.905816] pci :00:00.2: >irq 40 for MSI/MSI-X [0.917457] AMD-Vi: Lazy IO/TLB flushing enabled [0.922076] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) [0.928500] software IO TLB [mem 0xbb13d000-0xbf13cfff] (64MB) mapped at [8800bb13d000-8800bf13cfff] [0.938535] LVT offset 0 assigned for vector 0x400 [0.943338] perf: AMD IBS detected (0x00ff) [0.948037] audit: initializing netlink socket (disabled) [0.953432] type=2000 audit(1369659616.800:1): initialized [0.977011] HugeTLB registered 2 MB page size, pre-allocated 0 pages […] [7.881938] radeon :00:01.0: >VRAM: 512M 0x - 0x1FFF (512M used) [7.881941] radeon :00:01.0: >GTT: 512M 0x2000 - 0x3FFF […] [7.885516] radeon :00:01.0: >irq 48 for MSI/MSI-X [7.885525] radeon :00:01.0: >radeon: using MSI. […] [8.276775] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:01.0 domain=0x0003 address=0x000f001ae000 flags=0x0010] [8.287363] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:01.0 domain=0x0003 address=0x000f001acc00 flags=0x0010] [8.297945] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:01.0 domain=0x0003 address=0x000f001ae200 flags=0x0010] [8.308527] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:01.0 domain=0x0003 address=0x000f001ae080 flags=0x0010] [8.319109] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:01.0 domain=0x0003 address=0x000f001ae240 flags=0x0010] [8.329694] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:01.0 domain=0x0003 address=0x000f001accc0 flags=0x0010] [8.340276] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:01.0 domain=0x0003 address=0x000f001ace80 flags=0x0010] [8.350858] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:01.0 domain=0x0003 address=0x000f001acd80 flags=0x0010] [8.361441] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:01.0 domain=0x0003 address=0x000f001ae280 flags=0x0010] [8.372022] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:01.0 domain=0x0003 address=0x000f001ae180 flags=0x0010] [8.382605] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:01.0 domain=0x0003 address=0x000f001ace00 flags=0x0010] [8.393188] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:01.0 domain=0x0003 address=0x000f001acdc0 flags=0x0010] [8.403770] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:01.0 domain=0x0003 address=0x000f001ace40 flags=0x0010] [8.414353] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:01.0 domain=0x0003 address=0x000f001ae1c0 flags=0x0010] [8.424936] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:01.0 domain=0x0003 address=0x000f001acc40 flags=0x0010] [8.435518] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:01.0 domain=0x0003 address=0x000f001acc80 flags=0x0010] [8.446100] AMD-Vi: Event lo
Re: [PATCH] iommu/amd: Fix event counter availability check
Dear Alexander, Thank you very much for the patch. Am 31.05.20 um 09:22 schrieb Alexander Monakov: Adding Shuah Khan to Cc: I've noticed you've seen this issue on Ryzen 2400GE; can you have a look at the patch? Would be nice to know if it fixes the problem for you too. On Fri, 29 May 2020, Alexander Monakov wrote: The driver performs an extra check if the IOMMU's capabilities advertise presence of performance counters: it verifies that counters are writable by writing a hard-coded value to a counter and testing that reading that counter gives back the same value. Unfortunately it does so quite early, even before pci_enable_device is called for the IOMMU, i.e. when accessing its MMIO space is not guaranteed to work. On Ryzen 4500U CPU, this actually breaks the test: the driver assumes the counters are not writable, and disables the functionality. Moving init_iommu_perf_ctr just after iommu_flush_all_caches resolves the issue. This is the earliest point in amd_iommu_init_pci where the call succeeds on my laptop. Signed-off-by: Alexander Monakov Cc: Joerg Roedel Cc: Suravee Suthikulpanit Cc: iommu@lists.linux-foundation.org --- PS. I'm seeing another hiccup with IOMMU probing on my system: pci :00:00.2: can't derive routing for PCI INT A pci :00:00.2: PCI INT A: not connected Hopefully I can figure it out, but I'd appreciate hints. I guess it’s a firmware bug, but I contacted the linux-pci folks [1]. drivers/iommu/amd_iommu_init.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 5b81fd16f5fa..1b7ec6b6a282 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -1788,8 +1788,6 @@ static int __init iommu_init_pci(struct amd_iommu *iommu) if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE)) amd_iommu_np_cache = true; - init_iommu_perf_ctr(iommu); - if (is_rd890_iommu(iommu->dev)) { int i, j; @@ -1891,8 +1889,10 @@ static int __init amd_iommu_init_pci(void) init_device_table_dma(); - for_each_iommu(iommu) + for_each_iommu(iommu) { iommu_flush_all_caches(iommu); + init_iommu_perf_ctr(iommu); + } if (!ret) print_iommu_info(); base-commit: 75caf310d16cc5e2f851c048cd597f5437013368 Thank you very much for fixing this issue, which is almost two years old for me. Tested-by: Paul Menzel MSI MSI MS-7A37/B350M MORTAR with AMD Ryzen 3 2200G Link: https://lore.kernel.org/linux-iommu/20180727102710.ga6...@8bytes.org/ Kind regards, Paul [1]: https://lore.kernel.org/linux-pci/8579bd14-e369-1141-917b-204d20cff...@molgen.mpg.de/ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/amd: Print extended features in one line to fix divergent log levels
Currently, Linux logs the two messages below. [0.979142] pci :00:00.2: AMD-Vi: Extended features (0xf77ef22294ada): [0.979546] PPR NX GT IA GA PC GA_vAPIC The log level of these lines differs though. The first one has level *info*, while the second has level *warn*, which is confusing. $ dmesg -T --level=info | grep "Extended features" [Tue Jun 16 21:46:58 2020] pci :00:00.2: AMD-Vi: Extended features (0xf77ef22294ada): $ dmesg -T --level=warn | grep "PPR" [Tue Jun 16 21:46:58 2020] PPR NX GT IA GA PC GA_vAPIC The problem is, that commit 3928aa3f57 ("iommu/amd: Detect and enable guest vAPIC support") introduced a newline, causing `pr_cont()`, used to print the features, to default back to the default log level. /** * pr_cont - Continues a previous log message in the same line. * @fmt: format string * @...: arguments for the format string * * This macro expands to a printk with KERN_CONT loglevel. It should only be * used when continuing a log message with no newline ('\n') enclosed. Otherwise * it defaults back to KERN_DEFAULT loglevel. */ #define pr_cont(fmt, ...) \ printk(KERN_CONT fmt, ##__VA_ARGS__) So, remove the line break, so only one line is logged. Fixes: 3928aa3f57 ("iommu/amd: Detect and enable guest vAPIC support") Cc: Suravee Suthikulpanit Cc: iommu@lists.linux-foundation.org Signed-off-by: Paul Menzel --- drivers/iommu/amd_iommu_init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 5b81fd16f5faf8..8d9b2c94178c43 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -1844,7 +1844,7 @@ static void print_iommu_info(void) pci_info(pdev, "Found IOMMU cap 0x%hx\n", iommu->cap_ptr); if (iommu->cap & (1 << IOMMU_CAP_EFR)) { - pci_info(pdev, "Extended features (%#llx):\n", + pci_info(pdev, "Extended features (%#llx):", iommu->features); for (i = 0; i < ARRAY_SIZE(feat_str); ++i) { if (iommu_feature(iommu, (1ULL << i))) -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Messages to kexec@ get moderated (was: Crash kernel with 256 MB reserved memory runs into OOM condition)
Dear Dave, On 13.08.19 04:46, Dave Young wrote: > On 08/13/19 at 10:43am, Dave Young wrote: […] > The question is to Paul, also it would be always good to cc kexec mail > list for kexec and kdump issues. kexec@ was CCed in my original mail, but my messages got moderated. It’d great if you checked that with the list administrators. > Your mail to 'kexec' with the subject > > Crash kernel with 256 MB reserved memory runs into OOM condition > > Is being held until the list moderator can review it for approval. > > The reason it is being held: > > Message has a suspicious header > > Either the message will get posted to the list, or you will receive > notification of the moderator's decision. If you would like to cancel > this posting, please visit the following URL: > > > http://lists.infradead.org/mailman/confirm/kexec/a23ab6162ef34d099af5dd86c46113def5152bb1 Kind regards, Paul smime.p7s Description: S/MIME Cryptographic Signature
Re: Crash kernel with 256 MB reserved memory runs into OOM condition
Dear Dave, Thank you for your replies. On 2019-08-13 04:54, Dave Young wrote: > On 08/13/19 at 10:46am, Dave Young wrote: >> On 08/13/19 at 10:43am, Dave Young wrote: >>> On 08/12/19 at 11:50am, Michal Hocko wrote: >>>> On Mon 12-08-19 11:42:33, Paul Menzel wrote: >>>>> On a Dell PowerEdge R7425 with two AMD EPYC 7601 (total 128 threads) and >>>>> 1 TB RAM, the crash kernel with 256 MB of space reserved crashes. >>>>> >>>>> Please find the messages of the normal and the crash kernel attached. >>>> >>>> You will need more memory to reserve for the crash kernel because ... >>>> >>>>> [4.548703] Node 0 DMA free:484kB min:4kB low:4kB high:4kB >>>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB >>>>> unevictable:0kB writepending:0kB present:568kB managed:484kB mlocked:0kB >>>>> kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB >>>>> free_cma:0kB >>>>> [4.573612] lowmem_reserve[]: 0 125 125 125 >>>>> [4.577799] Node 0 DMA32 free:1404kB min:1428kB low:1784kB high:2140kB >>>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB >>>>> unevictable:15720kB writepending:0kB present:261560kB managed:133752kB >>>>> mlocked:0kB kernel_stack:2496kB pagetables:0kB bounce:0kB free_pcp:212kB >>>>> local_pcp:212kB free_cma:0kB >>>> >>>> ... the memory is really depleted and nothing to be reclaimed (no anon. >>>> file pages) Look how tht free memory is below min watermark (node zone DMA >>>> has >>>> lowmem protection for GFP_KERNEL allocation). >>> >>> We found similar issue on our side while working on kdump on SME enabled >>> systemd. Kairui is working on some patches. >>> >>> Actually on those SME/SEV enabled machines, swiotlb is enabled >>> automatically so at least we need extra 64M+ memory for kdump other >>> than the normal expectation. >>> >>> Can you check if this is also your case? >> >> The question is to Paul, also it would be always good to cc kexec mail >> list for kexec and kdump issues. As already replied was CCed in my original message, but the list put it under moderation. > Looks like hardware iommu is used, maybe you do not enable SME? Do you mean AMD Secure Memory Encryption? I do not think, we use that. > Also replace maxcpus=1 with nr_cpus=1 can save some memory, can have a > try. Thank you for this suggestion. That fixed it indeed, and the reserved memory can stay at 256 MB. (The parameter names are a little unintuitive – I guess due to historical reasons. Kind regards, Paul [1]: https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt smime.p7s Description: S/MIME Cryptographic Signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Crash kernel with 256 MB reserved memory runs into OOM condition
Dear Michal, On 12.08.19 11:50, Michal Hocko wrote: > On Mon 12-08-19 11:42:33, Paul Menzel wrote: >> On a Dell PowerEdge R7425 with two AMD EPYC 7601 (total 128 threads) and >> 1 TB RAM, the crash kernel with 256 MB of space reserved crashes. >> >> Please find the messages of the normal and the crash kernel attached. > > You will need more memory to reserve for the crash kernel because ... > >> [4.548703] Node 0 DMA free:484kB min:4kB low:4kB high:4kB >> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB >> unevictable:0kB writepending:0kB present:568kB managed:484kB mlocked:0kB >> kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB >> free_cma:0kB >> [4.573612] lowmem_reserve[]: 0 125 125 125 >> [4.577799] Node 0 DMA32 free:1404kB min:1428kB low:1784kB high:2140kB >> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB >> unevictable:15720kB writepending:0kB present:261560kB managed:133752kB >> mlocked:0kB kernel_stack:2496kB pagetables:0kB bounce:0kB free_pcp:212kB >> local_pcp:212kB free_cma:0kB > > ... the memory is really depleted and nothing to be reclaimed (no anon. > file pages) Look how tht free memory is below min watermark (node zone DMA has > lowmem protection for GFP_KERNEL allocation). > > [...] >> [4.923156] Out of memory and no killable processes... > > and there is no task existing to be killed so we go and panic. Yeah, we figured that. What we wonder is, how 256 MB are not enough for booting, and what hardware properties cause it to be too small. In the overview I just see a 60 MB allocation. [4.857565] kmalloc-2048 59164KB 59164KB Kind regards, Paul smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH] iommu/amd: Fix event counter availability check
Dear Alexander, Am 01.06.20 um 04:48 schrieb Paul Menzel: […] Am 31.05.20 um 09:22 schrieb Alexander Monakov: Adding Shuah Khan to Cc: I've noticed you've seen this issue on Ryzen 2400GE; can you have a look at the patch? Would be nice to know if it fixes the problem for you too. On Fri, 29 May 2020, Alexander Monakov wrote: The driver performs an extra check if the IOMMU's capabilities advertise presence of performance counters: it verifies that counters are writable by writing a hard-coded value to a counter and testing that reading that counter gives back the same value. Unfortunately it does so quite early, even before pci_enable_device is called for the IOMMU, i.e. when accessing its MMIO space is not guaranteed to work. On Ryzen 4500U CPU, this actually breaks the test: the driver assumes the counters are not writable, and disables the functionality. Moving init_iommu_perf_ctr just after iommu_flush_all_caches resolves the issue. This is the earliest point in amd_iommu_init_pci where the call succeeds on my laptop. Signed-off-by: Alexander Monakov Cc: Joerg Roedel Cc: Suravee Suthikulpanit Cc: iommu@lists.linux-foundation.org --- PS. I'm seeing another hiccup with IOMMU probing on my system: pci :00:00.2: can't derive routing for PCI INT A pci :00:00.2: PCI INT A: not connected Hopefully I can figure it out, but I'd appreciate hints. I guess it’s a firmware bug, but I contacted the linux-pci folks [1]. Unfortunately, it’s still present in Linux 5.11. drivers/iommu/amd_iommu_init.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 5b81fd16f5fa..1b7ec6b6a282 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -1788,8 +1788,6 @@ static int __init iommu_init_pci(struct amd_iommu *iommu) if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE)) amd_iommu_np_cache = true; - init_iommu_perf_ctr(iommu); - if (is_rd890_iommu(iommu->dev)) { int i, j; @@ -1891,8 +1889,10 @@ static int __init amd_iommu_init_pci(void) init_device_table_dma(); - for_each_iommu(iommu) + for_each_iommu(iommu) { iommu_flush_all_caches(iommu); + init_iommu_perf_ctr(iommu); + } if (!ret) print_iommu_info(); base-commit: 75caf310d16cc5e2f851c048cd597f5437013368 Thank you very much for fixing this issue, which is almost two years old for me. Tested-by: Paul Menzel MSI MSI MS-7A37/B350M MORTAR with AMD Ryzen 3 2200G Link: https://lore.kernel.org/linux-iommu/20180727102710.ga6...@8bytes.org/ Just a small note, that I am applying your patch, but it looks like there is still some timing issue. At least today, I noticed it during one boot with Linux 5.11. (Before I never noticed it again in the several years, but I am not always paying attention and do not save the logs.) Kind regards, Paul [1]: https://lore.kernel.org/linux-pci/8579bd14-e369-1141-917b-204d20cff...@molgen.mpg.de/ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Fix event counter availability check
Dear Suravee, Am 17.09.20 um 19:55 schrieb Alexander Monakov: On Tue, 16 Jun 2020, Suravee Suthikulpanit wrote: Instead of blindly moving the code around to a spot that would just work, I am trying to understand what might be required here. In this case, the init_device_table_dma()should not be needed. I suspect it's the IOMMU invalidate all command that's also needed here. I'm also checking with the HW and BIOS team. Meanwhile, could you please give the following change a try: Hello. Can you give any update please? […] Sorry for late reply. I have a reproducer and working with the HW team to understand the issue. I should be able to provide update with solution by the end of this week. Hello, hope you are doing well. Has this investigation found anything? I am wondering the same. It’d be great to have this fixed in the upstream Linux kernel. Kind regards, Paul ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Fix event counter availability check
Dear Suravee, Thank you for your reply. Am 22.02.21 um 18:59 schrieb Suravee Suthikulpanit: This fix has been accepted in the upstream recently. https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=x86/amd Indeed. Linux pulled also pulled this [1]. Could you please give this a try? Yes, I did give it a try, but, unfortunately, the problem is unfixed. I commented on the Linux Bugzilla bug report #201753 [1]. Kind regards, Paul PS: It’d be great if you didn’t top post, and used interleaved style for responses. [1]: https://bugzilla.kernel.org/show_bug.cgi?id=201753 "AMD-Vi: Unable to write to IOMMU perf counter" ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Fix event counter availability check
[cc: +suravee, +jörg] Dear Alex, dear Shuah, dear Suravee, dear Jörg, Am 03.06.20 um 08:54 schrieb Alexander Monakov: On Tue, 2 Jun 2020, Shuah Khan wrote: I changed the logic to read config to get max banks and counters before checking if counters are writable and tried writing to all. The result is the same and all of them aren't writable. However, when disable the writable check and assume they are, I can run [snip] This is similar to what I did. I also noticed that counters can be successfully used with perf if the initial check is ignored. I was considering sending a patch to remove the check and adjust the event counting logic to use counters as read-only, but after a bit more investigation I've noticed how late pci_enable_device is done, and came up with this patch. It's a path of less resistance: I'd expect maintainers to be more averse to removing the check rather than fixing it so it works as intended (even though I think the check should not be there in the first place). However: The ability to modify the counters is needed only for sampling the events (getting an interrupt when a counter overflows). There's no code to do that for these AMD IOMMU counters. A solution I would prefer is to not write to those counters at all. It would simplify or even remove a bunch of code. I can submit a corresponding patch if there's general agreement this path is ok. What do you think? I like this idea. Suravee, Jörg, what do you think? Commit 6778ff5b21b (iommu/amd: Fix performance counter initialization) delays the boot up to 100 ms, which is over 20 % on fast systems, and also just a workaround, and does not seem to work always. The delay is also not mentioned in the commit message. Kind regards, Paul [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6778ff5b21bd8e78c8bd547fd66437cf2657fd9b ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] Revert "iommu/amd: Fix performance counter initialization"
This reverts commit 6778ff5b21bd8e78c8bd547fd66437cf2657fd9b. The commit adds up to 100 ms to the boot process, which is not mentioned in the commit message, and is making up more than 20 % on current systems, where the Linux kernel takes 500 ms. [0.00] Linux version 5.11.0-10281-g19b4f3edd5c9 (root@a2ab663d937e) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1) #138 SMP Wed Feb 24 11:28:17 UTC 2021 […] [0.106422] smpboot: CPU0: AMD Ryzen 3 2200G with Radeon Vega Graphics (family: 0x17, model: 0x11, stepping: 0x0) […] [0.291257] pci :00:00.2: AMD-Vi: Unable to read/write to IOMMU perf counter. […] Also, it does not fix the problem on an MSI B350M MORTAR with AMD Ryzen 3 2200G (even with ten retries, resulting in 200 ms time-out). [0.401152] pci :00:00.2: AMD-Vi: Unable to read/write to IOMMU perf counter. Additionally, alternative proposed solutions [1] were not considered or discussed. [1]: https://lore.kernel.org/linux-iommu/alpine.lnx.2.20.13.2006030935570.3...@monopod.intra.ispras.ru/ Cc: Suravee Suthikulpanit Cc: Tj (Elloe Linux) Cc: Shuah Khan Cc: Alexander Monakov Cc: David Coe Cc: iommu@lists.linux-foundation.org Signed-off-by: Paul Menzel --- drivers/iommu/amd/init.c | 45 ++-- 1 file changed, 11 insertions(+), 34 deletions(-) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 9126efcbaf2c..af195f11d254 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -12,7 +12,6 @@ #include #include #include -#include #include #include #include @@ -257,8 +256,6 @@ static enum iommu_init_state init_state = IOMMU_START_STATE; static int amd_iommu_enable_interrupts(void); static int __init iommu_go_to_state(enum iommu_init_state state); static void init_device_table_dma(void); -static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, - u8 fxn, u64 *value, bool is_write); static bool amd_iommu_pre_enabled = true; @@ -1717,11 +1714,13 @@ static int __init init_iommu_all(struct acpi_table_header *table) return 0; } -static void __init init_iommu_perf_ctr(struct amd_iommu *iommu) +static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, + u8 fxn, u64 *value, bool is_write); + +static void init_iommu_perf_ctr(struct amd_iommu *iommu) { - int retry; struct pci_dev *pdev = iommu->dev; - u64 val = 0xabcd, val2 = 0, save_reg, save_src; + u64 val = 0xabcd, val2 = 0, save_reg = 0; if (!iommu_feature(iommu, FEATURE_PC)) return; @@ -1729,39 +1728,17 @@ static void __init init_iommu_perf_ctr(struct amd_iommu *iommu) amd_iommu_pc_present = true; /* save the value to restore, if writable */ - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, false) || - iommu_pc_get_set_reg(iommu, 0, 0, 8, &save_src, false)) - goto pc_false; - - /* -* Disable power gating by programing the performance counter -* source to 20 (i.e. counts the reads and writes from/to IOMMU -* Reserved Register [MMIO Offset 1FF8h] that are ignored.), -* which never get incremented during this init phase. -* (Note: The event is also deprecated.) -*/ - val = 20; - if (iommu_pc_get_set_reg(iommu, 0, 0, 8, &val, true)) + if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, false)) goto pc_false; /* Check if the performance counters can be written to */ - val = 0xabcd; - for (retry = 5; retry; retry--) { - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &val, true) || - iommu_pc_get_set_reg(iommu, 0, 0, 0, &val2, false) || - val2) - break; - - /* Wait about 20 msec for power gating to disable and retry. */ - msleep(20); - } - - /* restore */ - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, true) || - iommu_pc_get_set_reg(iommu, 0, 0, 8, &save_src, true)) + if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, &val, true)) || + (iommu_pc_get_set_reg(iommu, 0, 0, 0, &val2, false)) || + (val != val2)) goto pc_false; - if (val != val2) + /* restore */ + if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, true)) goto pc_false; pci_info(pdev, "IOMMU performance counters supported\n"); -- 2.30.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] Revert "iommu/amd: Fix performance counter initialization"
Dear Jörg, dear Suravee, Am 03.03.21 um 15:10 schrieb Alexander Monakov: On Wed, 3 Mar 2021, Suravee Suthikulpanit wrote: Additionally, alternative proposed solutions [1] were not considered or discussed. [1]:https://lore.kernel.org/linux-iommu/alpine.lnx.2.20.13.2006030935570.3...@monopod.intra.ispras.ru/ This check has been introduced early on to detect a HW issue for certain platforms in the past, where the performance counters are not accessible and would result in silent failure when try to use the counters. This is considered legacy code, and can be removed if we decide to no longer provide sanity check for such case. Which platforms? There is no such information in the code or the commit messages that introduced this. According to AMD's documentation, presence of performance counters is indicated by "PCSup" bit in the "EFR" register. I don't think the driver should second-guess that. If there were platforms where the CPU or the firmware lied to the OS (EFR[PCSup] was 1, but counters were not present), I think that should have been handled in a more explicit manner, e.g. via matching broken CPUs by cpuid. Suravee, could you please answer the questions? Jörg, I know you are probably busy, but the patch was applied to the stable series (v5.11.7). There are still too many question open regarding the patch, and Suravee has not yet addressed the comments. It’d be great, if you could revert it. Kind regards, Paul Could you please ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Fix extended features logging
Dear Alexander, Am 10.04.21 um 23:11 schrieb Alexander Monakov: print_iommu_info prints the EFR register and then the decoded list of features on a separate line: pci :00:00.2: AMD-Vi: Extended features (0x206d73ef22254ade): PPR X2APIC NX GT IA GA PC GA_vAPIC The second line is emitted via 'pr_cont', which causes it to have a different ('warn') loglevel compared to the previous line ('info'). Commit 9a295ff0ffc9 attempted to rectify this by removing the newline from the pci_info format string, but this doesn't work, as pci_info calls implicitly append a newline anyway. Hmm, did I really screw that up during my testing? I am sorry about that. I tried to wrap my head around, where the newline is implicitly appended, and only found the definitions below. include/linux/pci.h:#define pci_info(pdev, fmt, arg...) dev_info(&(pdev)->dev, fmt, ##arg) include/linux/dev_printk.h:#define dev_info(dev, fmt, ...) \ include/linux/dev_printk.h: _dev_info(dev, dev_fmt(fmt), ##__VA_ARGS__) include/linux/dev_printk.h:__printf(2, 3) __cold include/linux/dev_printk.h:void _dev_info(const struct device *dev, const char *fmt, ...); include/linux/compiler_attributes.h:#define __printf(a, b) __attribute__((__format__(printf, a, b))) Restore the newline, and call pr_info with empty format string to set the loglevel for subsequent pr_cont calls. The same solution is used in EFI and uvesafb drivers. Thank you for fixing this. Fixes: 9a295ff0ffc9 ("iommu/amd: Print extended features in one line to fix divergent log levels") Signed-off-by: Alexander Monakov Cc: Paul Menzel Cc: Joerg Roedel Cc: Suravee Suthikulpanit Cc: iommu@lists.linux-foundation.org --- drivers/iommu/amd/init.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 596d0c413473..a25e241eff1c 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -1929,8 +1929,11 @@ static void print_iommu_info(void) pci_info(pdev, "Found IOMMU cap 0x%hx\n", iommu->cap_ptr); if (iommu->cap & (1 << IOMMU_CAP_EFR)) { - pci_info(pdev, "Extended features (%#llx):", + pci_info(pdev, "Extended features (%#llx):\n", iommu->features); + + pr_info(""); + for (i = 0; i < ARRAY_SIZE(feat_str); ++i) { if (iommu_feature(iommu, (1ULL << i))) pr_cont(" %s", feat_str[i]); In the discussion *smpboot: CPU numbers printed as warning* [1] John wrote: It is supported to provide loglevels for CONT messages. The loglevel is then only used if the append fails: pr_cont(KERN_INFO "message part"); I don't know if we want to go down that path. But it is supported. Kind regards, Paul [1]: https://lkml.org/lkml/2021/2/16/191 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/amd: Put newline after closing bracket in warning
Currently, on the Dell OptiPlex 5055 the EFR mismatch warning looks like below. [1.479774] smpboot: CPU0: AMD Ryzen 5 PRO 1500 Quad-Core Processor (family: 0x17, model: 0x1, stepping: 0x1) […] [2.507370] AMD-Vi: [Firmware Warn]: EFR mismatch. Use IVHD EFR (0xf77ef22294ada : 0x400f77ef22294ada ). Add the newline after the `).`, so it’s on one line. Fixes: a44092e326d4 ("iommu/amd: Use IVHD EFR for early initialization of IOMMU features") Cc: iommu@lists.linux-foundation.org Cc: Suravee Suthikulpanit Cc: Brijesh Singh Cc: Robert Richter Signed-off-by: Paul Menzel --- drivers/iommu/amd/init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 321f5906e6ed..f7e31018cd0b 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -1837,7 +1837,7 @@ static void __init late_iommu_features_init(struct amd_iommu *iommu) * IVHD and MMIO conflict. */ if (features != iommu->features) - pr_warn(FW_WARN "EFR mismatch. Use IVHD EFR (%#llx : %#llx\n).", + pr_warn(FW_WARN "EFR mismatch. Use IVHD EFR (%#llx : %#llx).\n", features, iommu->features); } -- 2.31.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
AMD-Vi: [Firmware Warn]: EFR mismatch. Use IVHD EFR (0xf77ef22294ada : 0x400f77ef22294ada).
Dear Linux folks, On the Dell OptiPlex 5055, Linux warns about an EFR mismatch in the firmware. ``` […] [0.00] DMI: Dell Inc. OptiPlex 5055 Ryzen CPU/0P03DX, BIOS 1.1.20 05/31/2019 […] [1.479774] smpboot: CPU0: AMD Ryzen 5 PRO 1500 Quad-Core Processor (family: 0x17, model: 0x1, stepping: 0x1) […] [2.507370] AMD-Vi: [Firmware Warn]: EFR mismatch. Use IVHD EFR (0xf77ef22294ada : 0x400f77ef22294ada ). [2.507381] pci :00:00.2: AMD-Vi: IOMMU performance counters supported [2.525221] pci :00:00.2: can't derive routing for PCI INT A [2.531240] pci :00:00.2: PCI INT A: not connected [2.536415] pci :00:01.0: Adding to iommu group 0 [2.541485] pci :00:01.3: Adding to iommu group 1 […] ``` The difference in the MMIO value is a prepended 0x400. Can that be explained somehow? If not, it’d be great, if you could give more details about the firmware issue, so I can contact the Dell support to fix the firmware. Kind regards, Paul [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a44092e326d403c7878018ba532369f84d31dbfa ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RESEND PATCH v2] iommu/amd: Fix extended features logging
Am 04.05.21 um 12:22 schrieb Alexander Monakov: print_iommu_info prints the EFR register and then the decoded list of features on a separate line: pci :00:00.2: AMD-Vi: Extended features (0x206d73ef22254ade): PPR X2APIC NX GT IA GA PC GA_vAPIC The second line is emitted via 'pr_cont', which causes it to have a different ('warn') loglevel compared to the previous line ('info'). Commit 9a295ff0ffc9 attempted to rectify this by removing the newline from the pci_info format string, but this doesn't work, as pci_info calls implicitly append a newline anyway. Printing the decoded features on the same line would make it quite long. Instead, change pci_info() to pr_info() to omit PCI bus location info, which is also shown in the preceding message. This results in: pci :00:00.2: AMD-Vi: Found IOMMU cap 0x40 AMD-Vi: Extended features (0x206d73ef22254ade): PPR X2APIC NX GT IA GA PC GA_vAPIC AMD-Vi: Interrupt remapping enabled Fixes: 9a295ff0ffc9 ("iommu/amd: Print extended features in one line to fix divergent log levels") Link: https://lore.kernel.org/lkml/alpine.lnx.2.20.13.2104112326460.11...@monopod.intra.ispras.ru Signed-off-by: Alexander Monakov Cc: Paul Menzel Cc: Joerg Roedel Cc: Suravee Suthikulpanit Cc: iommu@lists.linux-foundation.org --- drivers/iommu/amd/init.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 429a4baa3aee..8f0eb865119a 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -1954,8 +1954,8 @@ static void print_iommu_info(void) pci_info(pdev, "Found IOMMU cap 0x%x\n", iommu->cap_ptr); if (iommu->cap & (1 << IOMMU_CAP_EFR)) { - pci_info(pdev, "Extended features (%#llx):", -iommu->features); + pr_info("Extended features (%#llx):", iommu->features); + for (i = 0; i < ARRAY_SIZE(feat_str); ++i) { if (iommu_feature(iommu, (1ULL << i))) pr_cont(" %s", feat_str[i]); base-commit: 9f4ad9e425a1d3b6a34617b8ea226d56a119a717 Reviewed-by: Paul Menzel Kind regards, Paul ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: AMD-Vi: [Firmware Warn]: EFR mismatch. Use IVHD EFR (0xf77ef22294ada : 0x400f77ef22294ada).
[Cc: +Dell Linux kernel client team] Dear Linux folks, Am 12.04.21 um 20:07 schrieb Paul Menzel: On the Dell OptiPlex 5055, Linux warns about an EFR mismatch in the firmware. ``` […] [ 0.00] DMI: Dell Inc. OptiPlex 5055 Ryzen CPU/0P03DX, BIOS 1.1.20 05/31/2019 […] [ 1.479774] smpboot: CPU0: AMD Ryzen 5 PRO 1500 Quad-Core Processor (family: 0x17, model: 0x1, stepping: 0x1) […] [ 2.507370] AMD-Vi: [Firmware Warn]: EFR mismatch. Use IVHD EFR (0xf77ef22294ada : 0x400f77ef22294ada). [ 2.507381] pci :00:00.2: AMD-Vi: IOMMU performance counters supported [ 2.525221] pci :00:00.2: can't derive routing for PCI INT A [ 2.531240] pci :00:00.2: PCI INT A: not connected [ 2.536415] pci :00:01.0: Adding to iommu group 0 [ 2.541485] pci :00:01.3: Adding to iommu group 1 […] ``` The difference in the MMIO value is a prepended 0x400. Can that be explained somehow? If not, it’d be great, if you could give more details about the firmware issue, so I can contact the Dell support to fix the firmware. Linux 5.15-rc1 still warns about that (also with latest system firmware 1.1.50). Kind regards, Paul [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a44092e326d403c7878018ba532369f84d31dbfa ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: AMD-Vi: [Firmware Warn]: EFR mismatch. Use IVHD EFR (0xf77ef22294ada : 0x400f77ef22294ada).
[Cc: +Mario from AMD] Dear Jörg, Am 14.09.21 um 14:09 schrieb Jörg Rödel: On Tue, Sep 14, 2021 at 11:10:57AM +0200, Paul Menzel wrote: Linux 5.15-rc1 still warns about that (also with latest system firmware 1.1.50). The reason is most likely that the latest firmware still reports a different EFR feature set in the IVRS table than the IOMMU reports in its EFR MMIO register. What do you mean exactly? Only 0x400 is prepended. The rest of the string is identical. What feature set would the 0x400 in the beginning be? Anyway, it’d be great if AMD and Dell could take a look. Kind regards, Paul ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: AMD-Vi: [Firmware Warn]: EFR mismatch. Use IVHD EFR (0xf77ef22294ada : 0x400f77ef22294ada).
[Use Mario’s current address] Am 15.09.21 um 00:15 schrieb Paul Menzel: [Cc: +Mario from AMD] Dear Jörg, Am 14.09.21 um 14:09 schrieb Jörg Rödel: On Tue, Sep 14, 2021 at 11:10:57AM +0200, Paul Menzel wrote: Linux 5.15-rc1 still warns about that (also with latest system firmware 1.1.50). The reason is most likely that the latest firmware still reports a different EFR feature set in the IVRS table than the IOMMU reports in its EFR MMIO register. What do you mean exactly? Only 0x400 is prepended. The rest of the string is identical. What feature set would the 0x400 in the beginning be? Anyway, it’d be great if AMD and Dell could take a look. Kind regards, Paul ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: AMD-Vi: [Firmware Warn]: EFR mismatch. Use IVHD EFR (0xf77ef22294ada : 0x400f77ef22294ada).
Dear Jörg, Am 15.09.21 um 10:30 schrieb Jörg Rödel: Mainly DELL should look at this, because it is their BIOS which is responsible for this inconsistency. I am not so sure about that, as today’s (x86) firmware often consists of platform initialization (PI) code (or firmware support package (FSP), provided by the chipset/SoC vendors, like AGESA for AMD, which the ODM just integrate. If only Dell systems are affected, that would of course point to a bug in the Dell firmware. Kind regards, Paul ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: AMD-Vi: [Firmware Warn]: EFR mismatch. Use IVHD EFR (0xf77ef22294ada : 0x400f77ef22294ada).
Dear Linux folks, Am 15.09.21 um 00:17 schrieb Paul Menzel: [Use Mario’s current address] Am 15.09.21 um 00:15 schrieb Paul Menzel: [Cc: +Mario from AMD] Am 14.09.21 um 14:09 schrieb Jörg Rödel: On Tue, Sep 14, 2021 at 11:10:57AM +0200, Paul Menzel wrote: Linux 5.15-rc1 still warns about that (also with latest system firmware 1.1.50). The reason is most likely that the latest firmware still reports a different EFR feature set in the IVRS table than the IOMMU reports in its EFR MMIO register. What do you mean exactly? Only 0x400 is prepended. The rest of the string is identical. What feature set would the 0x400 in the beginning be? ACPICA 20200326 is able to deassemble IVRS subtable type 0x11. The incorrect(?) value can be seen there. $ sudo iasl -d -p IVRS.dsl /sys/firmware/acpi/tables/IVRS […] $ grep EFR IVRS.dsl [090h 0144 8]EFR Image : 400F77EF22294ADA Anyway, it’d be great if AMD and Dell could take a look. Dell client kernel team, please confirm, that you received the report. Kind regards, Paul [1]: https://acpica.org/node/178 /* * Intel ACPI Component Architecture * AML/ASL+ Disassembler version 20200925 (64-bit version) * Copyright (c) 2000 - 2020 Intel Corporation * * Disassembly of IVRS, Wed Sep 15 11:48:42 2021 * * ACPI Data Table [IVRS] * * Format: [HexOffset DecimalOffset ByteLength] FieldName : FieldValue */ [000h 4]Signature : "IVRS"[I/O Virtualization Reporting Structure] [004h 0004 4] Table Length : 00D0 [008h 0008 1] Revision : 02 [009h 0009 1] Checksum : EE [00Ah 0010 6] Oem ID : "AMD" [010h 0016 8] Oem Table ID : "MYRTLE" [018h 0024 4] Oem Revision : 0001 [01Ch 0028 4] Asl Compiler ID : "ACPI" [020h 0032 4]Asl Compiler Revision : 0004 [024h 0036 4] Virtualization Info : 00203041 [028h 0040 8] Reserved : [030h 0048 1]Subtable Type : 10 [Hardware Definition Block] [031h 0049 1]Flags : B0 [032h 0050 2] Length : 0048 [034h 0052 2] DeviceId : 0002 [036h 0054 2]Capability Offset : 0040 [038h 0056 8] Base Address : FC00 [040h 0064 2]PCI Segment Group : [042h 0066 2] Virtualization Info : [044h 0068 4]Feature Reporting : 80048F6E [048h 0072 1] Entry Type : 03 [049h 0073 2]Device ID : 0008 [04Bh 0075 1] Data Setting : 00 [04Ch 0076 1] Entry Type : 04 [04Dh 0077 2]Device ID : FFFE [04Fh 0079 1] Data Setting : 00 [050h 0080 1] Entry Type : 43 [051h 0081 2]Device ID : FF00 [053h 0083 1] Data Setting : 00 [054h 0084 1] Reserved : 00 [055h 0085 2]Source Used Device ID : 00A4 [057h 0087 1] Reserved : 00 [058h 0088 1] Entry Type : 04 [059h 0089 2]Device ID : [05Bh 0091 1] Data Setting : 00 [05Ch 0092 1] Entry Type : 00 [05Dh 0093 2]Device ID : [05Fh 0095 1] Data Setting : 00 [060h 0096 1] Entry Type : 48 [061h 0097 2]Device ID : [063h 0099 1] Data Setting : 00 [064h 0100 1] Handle : 00 [065h 0101 2]Source Used Device ID : 00A0 [067h 0103 1] Variety : 02 [068h 0104 1] Entry Type : 48 [069h 0105 2]Device ID : [06Bh 0107 1] Data Setting : D7 [06Ch 0108 1] Handle : 20 [06Dh 0109 2]Source Used Device ID : 00A0 [06Fh 0111 1] Variety : 01 [070h 0112 1] Entry Type : 48 [071h 0113 2]Device ID : [073h 0115 1] Data Setting : 00 [074h 0116 1] Handle : 21 [075h 0117 2]Source Used Device ID : 0001 [077h 0119 1] Variety : 01 [078h 0120 1]Subtable Type : 11 [Hardware Definition Block] [079h 0121 1]Flags : B0 [07Ah 0122 2] Length : 0058 [07Ch 0124 2] DeviceId : 0002 [07Eh 0126 2]Capability Offset : 0040 [080h 0128 8] Base Address : FC00 [088h 0136 2]PCI Segment Group : [08Ah 0138 2] Virtualization Info : [08Ch 0140 4] Attributes : 00040200 [090h 0144 8]
I got an IOMMU IO page fault. What to do now?
Dear Linux folks, On a Dell OptiPlex 5055, Linux 5.10.24 logged the IOMMU messages below. (GPU hang in amdgpu issue #1762 [1] might be related.) $ lspci -nn -s 05:00.0 05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Oland [Radeon HD 8570 / R7 240/340 OEM] [1002:6611] (rev 87) $ dmesg […] [6318399.745242] amdgpu :05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfff0c0 flags=0x0020] [6318399.757283] amdgpu :05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfff7c0 flags=0x0020] [6318399.769154] amdgpu :05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xffe0c0 flags=0x0020] [6318399.780913] amdgpu :05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfffec0 flags=0x0020] [6318399.792734] amdgpu :05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xffe5c0 flags=0x0020] [6318399.804309] amdgpu :05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xffd0c0 flags=0x0020] [6318399.816091] amdgpu :05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xffecc0 flags=0x0020] [6318399.827407] amdgpu :05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xffd3c0 flags=0x0020] [6318399.838708] amdgpu :05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xffc0c0 flags=0x0020] [6318399.850029] amdgpu :05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xffdac0 flags=0x0020] [6318399.861311] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x000c address=0xffc1c0 flags=0x0020] [6318399.872044] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x000c address=0xffc8c0 flags=0x0020] [6318399.882797] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x000c address=0xffb0c0 flags=0x0020] [6318399.893655] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x000c address=0xffcfc0 flags=0x0020] [6318399.904445] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x000c address=0xffb6c0 flags=0x0020] [6318399.915222] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x000c address=0xffa0c0 flags=0x0020] [6318399.925931] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x000c address=0xffbdc0 flags=0x0020] [6318399.936691] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x000c address=0xffa4c0 flags=0x0020] [6318399.947479] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x000c address=0xff90c0 flags=0x0020] [6318399.958270] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x000c address=0xffabc0 flags=0x0020] As this is not reproducible, how would debugging go? (The system was rebooted in the meantime.) What options should be enabled, that next time the required information is logged, or what commands should I execute when the system is still in that state, so the bug (driver, userspace, …) can be pinpointed and fixed? Kind regards, Paul [1]: https://gitlab.freedesktop.org/drm/amd/-/issues/1762 "Oland [Radeon HD 8570 / R7 240/340 OEM]: GPU hang" ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: I got an IOMMU IO page fault. What to do now?
Dear Christian, Thank you for your reply. On 25.10.21 13:23, Christian König wrote: not sure how the IOMMU gives out addresses, but the printed ones look suspicious to me. Something like we are using an invalid address like -1 or similar. Can you try that on an up to date kernel as well? E.g. ideally bleeding edge amd-staging-drm-next from Alex repository. These are production desktops, so I’d need to talk to the user. Currently, Linux 5.10.70 is running. Kind regards, Paul ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: I got an IOMMU IO page fault. What to do now?
Dear Robin, On 25.10.21 18:01, Robin Murphy wrote: On 2021-10-25 12:23, Christian König wrote: not sure how the IOMMU gives out addresses, but the printed ones look suspicious to me. Something like we are using an invalid address like -1 or similar. FWIW those look like believable DMA addresses to me, assuming that the DMA mapping APIs are being backed iommu_dma_ops and the device has a 40-bit DMA mask, since the IOVA allocator works top-down. Likely causes are either a race where the dma_unmap_*() call happens before the hardware has really stopped accessing the relevant addresses, or the device's DMA mask has been set larger than it should be, and thus the upper bits have been truncated in the round-trip through the hardware. Given the addresses involved, my suspicions would initially lean towards the latter case - the faults are in the very topmost pages which imply they're the first things mapped in that range. The other contributing factor being the trick that the IOVA allocator plays for PCI devices, where it tries to prefer 32-bit addresses. Thus you're only likely to see this happen once you already have ~3.5-4GB of live DMA-mapped memory to exhaust the 32-bit IOVA space (minus some reserved areas) and start allocating from the full DMA mask. You should be able to check that with a 5.13 or newer kernel by booting with "iommu.forcedac=1" and seeing if it breaks immediately (unfortunately with an older kernel you'd have to manually hack iommu_dma_alloc_iova() to the same effect). I booted Linux 5.15-rc7 with `iommu.forcedac=1` and the system booted, and I could log in remotely over SSH. Please find the Linux kernel messages attached. (The system logs say lightdm failed to start, but it might be some other issue due to a change in the operating system.) Can you try that on an up to date kernel as well? E.g. ideally bleeding edge amd-staging-drm-next from Alex repository. Kind regards, Paul[0.00] Linux version 5.15.0-rc7.mx64.407 (r...@theinternet.molgen.mpg.de) (gcc (GCC) 7.5.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Oct 26 04:00:49 CEST 2021 [0.00] Command line: BOOT_IMAGE=/boot/bzImage-5.15.0-rc7.mx64.407 root=LABEL=root ro crashkernel=256M console=ttyS0,115200n8 console=tty0 init=/bin/systemd audit=0 random.trust_cpu=on iommu.forcedac=1 [0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [0.00] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format. [0.00] signal: max sigframe size: 1776 [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x00087fff] usable [0.00] BIOS-e820: [mem 0x00088000-0x00088fff] reserved [0.00] BIOS-e820: [mem 0x00089000-0x0009] usable [0.00] BIOS-e820: [mem 0x000a-0x000b] reserved [0.00] BIOS-e820: [mem 0x0010-0x09cf] usable [0.00] BIOS-e820: [mem 0x09d0-0x09e6] reserved [0.00] BIOS-e820: [mem 0x09e7-0xdadbefff] usable [0.00] BIOS-e820: [mem 0xdadbf000-0xdafbefff] type 20 [0.00] BIOS-e820: [mem 0xdafbf000-0xdcfbefff] reserved [0.00] BIOS-e820: [mem 0xdcfbf000-0xdefbefff] ACPI NVS [0.00] BIOS-e820: [mem 0xdefbf000-0xdeffefff] ACPI data [0.00] BIOS-e820: [mem 0xdefff000-0xdeff] usable [0.00] BIOS-e820: [mem 0xdf00-0xdfff] reserved [0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved [0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved [0.00] BIOS-e820: [mem 0xfed8-0xfed80fff] reserved [0.00] BIOS-e820: [mem 0xff00-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00041eff] usable [0.00] Forcing DAC for PCI devices [0.00] NX (Execute Disable) protection: active [0.00] efi: EFI v2.50 by EDK II [0.00] efi: TPMFinalLog=0xdefa2000 SMBIOS=0xdbeb5000 SMBIOS 3.0=0xdbeb3000 ACPI 2.0=0xdeffe014 ESRT=0xdbe9f298 MEMATTR=0xd8b6f018 [0.00] SMBIOS 3.0.0 present. [0.00] DMI: Dell Inc. OptiPlex 5055 Ryzen CPU/0P03DX, BIOS 1.1.50 07/28/2021 [0.00] tsc: Fast TSC calibration using PIT [0.00] tsc: Detected 3493.349 MHz processor [0.001974] e820: update [mem 0x-0x0fff] usable ==> reserved [0.001976] e820: remove [mem 0x000a-0x000f] usable [0.001981] las
How to reduce PCI initialization from 5 s (1.5 s adding them to IOMMU groups)
Dear Linux folks, On a PowerEdge T440/021KCD, BIOS 2.11.2 04/22/2021, Linux 5.10.70 takes almost five seconds to initialize PCI. According to the timestamps, 1.5 s are from assigning the PCI devices to the 142 IOMMU groups. ``` $ lspci | wc -l 281 $ dmesg […] [2.918411] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug [2.933841] ACPI: Enabled 5 GPEs in block 00 to 7F [2.973739] ACPI: PCI Root Bridge [PC00] (domain [bus 00-16]) [2.980398] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3] [2.989457] acpi PNP0A08:00: _OSC: platform does not support [LTR] [2.995451] acpi PNP0A08:00: _OSC: OS now controls [PME PCIeCapability] [3.001394] acpi PNP0A08:00: FADT indicates ASPM is unsupported, using BIOS configuration [3.010511] PCI host bridge to bus :00 […] [6.233508] system 00:05: [io 0x1000-0x10fe] has been reserved [6.239420] system 00:05: Plug and Play ACPI device, IDs PNP0c02 (active) [6.239906] pnp: PnP ACPI: found 6 devices […] [6.989016] pci :d7:05.0: disabled boot interrupts on device [8086:2034] [6.996063] PCI: CLS 0 bytes, default 64 [7.08] Trying to unpack rootfs image as initramfs... [7.065281] Freeing initrd memory: 5136K […] [7.079098] DMAR: dmar7: Using Queued invalidation [7.083983] pci :00:00.0: Adding to iommu group 0 […] [8.537808] pci :d7:17.1: Adding to iommu group 141 [8.571191] DMAR: Intel(R) Virtualization Technology for Directed I/O [8.577618] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) […] ``` Is there anything that could be done to reduce the time? Kind regards, Paul ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: How to reduce PCI initialization from 5 s (1.5 s adding them to IOMMU groups)s
Dear Linux folks, Am 05.11.21 um 12:56 schrieb Paul Menzel: On a PowerEdge T440/021KCD, BIOS 2.11.2 04/22/2021, Linux 5.10.70 takes almost five seconds to initialize PCI. According to the timestamps, 1.5 s are from assigning the PCI devices to the 142 IOMMU groups. ``` $ lspci | wc -l 281 $ dmesg […] [ 2.918411] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug [ 2.933841] ACPI: Enabled 5 GPEs in block 00 to 7F [ 2.973739] ACPI: PCI Root Bridge [PC00] (domain [bus 00-16]) [ 2.980398] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3] [ 2.989457] acpi PNP0A08:00: _OSC: platform does not support [LTR] [ 2.995451] acpi PNP0A08:00: _OSC: OS now controls [PME PCIeCapability] [ 3.001394] acpi PNP0A08:00: FADT indicates ASPM is unsupported, using BIOS configuration [ 3.010511] PCI host bridge to bus :00 […] [ 6.233508] system 00:05: [io 0x1000-0x10fe] has been reserved [ 6.239420] system 00:05: Plug and Play ACPI device, IDs PNP0c02 (active) [ 6.239906] pnp: PnP ACPI: found 6 devices […] [ 6.989016] pci :d7:05.0: disabled boot interrupts on device [8086:2034] [ 6.996063] PCI: CLS 0 bytes, default 64 [ 7.08] Trying to unpack rootfs image as initramfs... [ 7.065281] Freeing initrd memory: 5136K […] [ 7.079098] DMAR: dmar7: Using Queued invalidation [ 7.083983] pci :00:00.0: Adding to iommu group 0 […] [ 8.537808] pci :d7:17.1: Adding to iommu group 141 [ 8.571191] DMAR: Intel(R) Virtualization Technology for Directed I/O [ 8.577618] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) […] ``` Is there anything that could be done to reduce the time? I created an issue at the Kernel.org Bugzilla, and attached the output of `dmesg` there [1]. Kind regards, Paul [1]: https://bugzilla.kernel.org/show_bug.cgi?id=214953 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: How to reduce PCI initialization from 5 s (1.5 s adding them to IOMMU groups)
Dear Bjorn, Thank you for your quick reply. Am 05.11.21 um 19:53 schrieb Bjorn Helgaas: On Fri, Nov 05, 2021 at 12:56:09PM +0100, Paul Menzel wrote: On a PowerEdge T440/021KCD, BIOS 2.11.2 04/22/2021, Linux 5.10.70 takes almost five seconds to initialize PCI. According to the timestamps, 1.5 s are from assigning the PCI devices to the 142 IOMMU groups. ``` $ lspci | wc -l 281 $ dmesg […] [2.918411] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug [2.933841] ACPI: Enabled 5 GPEs in block 00 to 7F [2.973739] ACPI: PCI Root Bridge [PC00] (domain [bus 00-16]) [2.980398] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3] [2.989457] acpi PNP0A08:00: _OSC: platform does not support [LTR] [2.995451] acpi PNP0A08:00: _OSC: OS now controls [PME PCIeCapability] [3.001394] acpi PNP0A08:00: FADT indicates ASPM is unsupported, using BIOS configuration [3.010511] PCI host bridge to bus :00 […] [6.233508] system 00:05: [io 0x1000-0x10fe] has been reserved [6.239420] system 00:05: Plug and Play ACPI device, IDs PNP0c02 (active) [6.239906] pnp: PnP ACPI: found 6 devices For ~280 PCI devices, (6.24-2.92)/280 = 0.012 s/dev. On my laptop I have about (.66-.37)/36 = 0.008 s/dev (on v5.4), so about the same ballpark. Though if it was on average 0.008 s/dev here, around a second could be saved. The integrated Matrox G200eW3 graphics controller (102b:0536) and the two Broadcom NetXtreme BCM5720 2-port Gigabit Ethernet PCIe cards (14e4:165f) take 150 ms to be initialized. [3.454409] pci :03:00.0: [102b:0536] type 00 class 0x03 [3.460411] pci :03:00.0: reg 0x10: [mem 0x9100-0x91ff pref] [3.467403] pci :03:00.0: reg 0x14: [mem 0x92808000-0x9280bfff] [3.473402] pci :03:00.0: reg 0x18: [mem 0x9200-0x927f] [3.479437] pci :03:00.0: BAR 0: assigned to efifb The timestamp in each line differs by around 6 ms. Could printing the messages to the console (VGA) hold this up (line 373 to line 911 makes (6.24 s-2.92 s)/(538 lines) = (3.32 s)/(538 lines) = 6 ms)? [3.484480] pci :02:00.0: PCI bridge to [bus 03] [3.489401] pci :02:00.0: bridge window [mem 0x9200-0x928f] [3.496398] pci :02:00.0: bridge window [mem 0x9100-0x91ff 64bit pref] [3.504446] pci :04:00.0: [14e4:165f] type 00 class 0x02 [3.510415] pci :04:00.0: reg 0x10: [mem 0x92e3-0x92e3 64bit pref] [3.517408] pci :04:00.0: reg 0x18: [mem 0x92e4-0x92e4 64bit pref] [3.524407] pci :04:00.0: reg 0x20: [mem 0x92e5-0x92e5 64bit pref] [3.532402] pci :04:00.0: reg 0x30: [mem 0xfffc-0x pref] [3.538483] pci :04:00.0: PME# supported from D0 D3hot D3cold [3.544437] pci :04:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at :00:1c.5 (capable of 8.000 Gb/s with 5.0 GT/s PCIe x2 link) [3.559493] pci :04:00.1: [14e4:165f] type 00 class 0x02 Here is a 15 ms delay. [3.565415] pci :04:00.1: reg 0x10: [mem 0x92e0-0x92e0 64bit pref] [3.573407] pci :04:00.1: reg 0x18: [mem 0x92e1-0x92e1 64bit pref] [3.580407] pci :04:00.1: reg 0x20: [mem 0x92e2-0x92e2 64bit pref] [3.587402] pci :04:00.1: reg 0x30: [mem 0xfffc-0x pref] [3.594483] pci :04:00.1: PME# supported from D0 D3hot D3cold [3.600502] pci :00:1c.5: PCI bridge to [bus 04] Can the 6 ms – also from your system – be explained by the PCI specification? Seeing how fast PCI nowadays is, 6 ms sounds like a long time. ;-) Faster would always be better, of course. I assume this is not really a regression? Correct, as far as I know of, this is no regression. [6.989016] pci :d7:05.0: disabled boot interrupts on device [8086:2034] [6.996063] PCI: CLS 0 bytes, default 64 [7.08] Trying to unpack rootfs image as initramfs... [7.065281] Freeing initrd memory: 5136K The PCI resource assignment(?) also seems to take 670 ms: [6.319656] pci :04:00.0: can't claim BAR 6 [mem 0xfffc-0x pref]: no compatible bridge window […] [6.989016] pci :d7:05.0: disabled boot interrupts on device [8086:2034] […] [7.079098] DMAR: dmar7: Using Queued invalidation [7.083983] pci :00:00.0: Adding to iommu group 0 […] [8.537808] pci :d7:17.1: Adding to iommu group 141 I don't have this iommu stuff turned on and don't know what's happening here. There is a lock in `iommu_group_add_device()` in `drivers/iommu/iommu.c`: mutex_lock(&group->mutex); list_add_tail(&device->list, &group->devices); if (group->domain && !iommu
Re: How to reduce PCI initialization from 5 s (1.5 s adding them to IOMMU groups)
Dear Krzysztof, Thank you for your reply. Am 08.11.21 um 18:18 schrieb Krzysztof Wilczyński: On a PowerEdge T440/021KCD, BIOS 2.11.2 04/22/2021, Linux 5.10.70 takes almost five seconds to initialize PCI. According to the timestamps, 1.5 s are from assigning the PCI devices to the 142 IOMMU groups. [...] Is there anything that could be done to reduce the time? I am curious - why is this a problem? Are you power-cycling your servers so often to the point where the cumulative time spent in enumerating PCI devices and adding them later to IOMMU groups is a problem? I am simply wondering why you decided to signal out the PCI enumeration as slow in particular, especially given that a large server hardware tends to have (most of the time, as per my experience) rather long initialisation time either from being powered off or after being power cycled. I can take a while before the actual operating system itself will start. It’s not a problem per se, and more a pet peeve of mine. Systems get faster and faster, and boottime slower and slower. On desktop systems, it’s much more important with firmware like coreboot taking less than one second to initialize the hardware and passing control to the payload/operating system. If we are lucky, we are going to have servers with FLOSS firmware. But, already now, using kexec to reboot a system, avoids the problems you pointed out on servers, and being able to reboot a system as quickly as possible, lowers the bar for people to reboot systems more often to, for example, so updates take effect. We talked about this briefly with Bjorn, and there might be an option to perhaps add some caching, as we suspect that the culprit here is doing PCI configuration space read for each device, which can be slow on some platforms. However, we would need to profile this to get some quantitative data to see whether doing anything would even be worthwhile. It would definitely help us understand better where the bottlenecks really are and of what magnitude. I personally don't have access to such a large hardware like the one you have access to, thus I was wondering whether you would have some time, and be willing, to profile this for us on the hardware you have. Let me know what do you think? Sounds good. I’d be willing to help. Note, that I won’t have time before Wednesday next week though. Kind regards, Paul ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: How to reduce PCI initialization from 5 s (1.5 s adding them to IOMMU groups)
Dear Robin, Thank you for your reply. Am 09.11.21 um 16:31 schrieb Robin Murphy: On 2021-11-06 10:42, Paul Menzel wrote: Am 05.11.21 um 19:53 schrieb Bjorn Helgaas: On Fri, Nov 05, 2021 at 12:56:09PM +0100, Paul Menzel wrote: On a PowerEdge T440/021KCD, BIOS 2.11.2 04/22/2021, Linux 5.10.70 takes almost five seconds to initialize PCI. According to the timestamps, 1.5 s are from assigning the PCI devices to the 142 IOMMU groups. ``` $ lspci | wc -l 281 $ dmesg […] [ 2.918411] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug [ 2.933841] ACPI: Enabled 5 GPEs in block 00 to 7F [ 2.973739] ACPI: PCI Root Bridge [PC00] (domain [bus 00-16]) [ 2.980398] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3] [ 2.989457] acpi PNP0A08:00: _OSC: platform does not support [LTR] [ 2.995451] acpi PNP0A08:00: _OSC: OS now controls [PME PCIeCapability] [ 3.001394] acpi PNP0A08:00: FADT indicates ASPM is unsupported, using BIOS configuration [ 3.010511] PCI host bridge to bus :00 […] [ 6.233508] system 00:05: [io 0x1000-0x10fe] has been reserved [ 6.239420] system 00:05: Plug and Play ACPI device, IDs PNP0c02 (active) [ 6.239906] pnp: PnP ACPI: found 6 devices For ~280 PCI devices, (6.24-2.92)/280 = 0.012 s/dev. On my laptop I have about (.66-.37)/36 = 0.008 s/dev (on v5.4), so about the same ballpark. Though if it was on average 0.008 s/dev here, around a second could be saved. The integrated Matrox G200eW3 graphics controller (102b:0536) and the two Broadcom NetXtreme BCM5720 2-port Gigabit Ethernet PCIe cards (14e4:165f) take 150 ms to be initialized. [ 3.454409] pci :03:00.0: [102b:0536] type 00 class 0x03 [ 3.460411] pci :03:00.0: reg 0x10: [mem 0x9100-0x91ff pref] [ 3.467403] pci :03:00.0: reg 0x14: [mem 0x92808000-0x9280bfff] [ 3.473402] pci :03:00.0: reg 0x18: [mem 0x9200-0x927f] [ 3.479437] pci :03:00.0: BAR 0: assigned to efifb The timestamp in each line differs by around 6 ms. Could printing the messages to the console (VGA) hold this up (line 373 to line 911 makes (6.24 s-2.92 s)/(538 lines) = (3.32 s)/(538 lines) = 6 ms)? [ 3.484480] pci :02:00.0: PCI bridge to [bus 03] [ 3.489401] pci :02:00.0: bridge window [mem 0x9200-0x928f] [ 3.496398] pci :02:00.0: bridge window [mem 0x9100-0x91ff 64bit pref] [ 3.504446] pci :04:00.0: [14e4:165f] type 00 class 0x02 [ 3.510415] pci :04:00.0: reg 0x10: [mem 0x92e3-0x92e3 64bit pref] [ 3.517408] pci :04:00.0: reg 0x18: [mem 0x92e4-0x92e4 64bit pref] [ 3.524407] pci :04:00.0: reg 0x20: [mem 0x92e5-0x92e5 64bit pref] [ 3.532402] pci :04:00.0: reg 0x30: [mem 0xfffc-0x pref] [ 3.538483] pci :04:00.0: PME# supported from D0 D3hot D3cold [ 3.544437] pci :04:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at :00:1c.5 (capable of 8.000 Gb/s with 5.0 GT/s PCIe x2 link) [ 3.559493] pci :04:00.1: [14e4:165f] type 00 class 0x02 Here is a 15 ms delay. [ 3.565415] pci :04:00.1: reg 0x10: [mem 0x92e0-0x92e0 64bit pref] [ 3.573407] pci :04:00.1: reg 0x18: [mem 0x92e1-0x92e1 64bit pref] [ 3.580407] pci :04:00.1: reg 0x20: [mem 0x92e2-0x92e2 64bit pref] [ 3.587402] pci :04:00.1: reg 0x30: [mem 0xfffc-0x pref] [ 3.594483] pci :04:00.1: PME# supported from D0 D3hot D3cold [ 3.600502] pci :00:1c.5: PCI bridge to [bus 04] Can the 6 ms – also from your system – be explained by the PCI specification? Seeing how fast PCI nowadays is, 6 ms sounds like a long time. ;-) Faster would always be better, of course. I assume this is not really a regression? Correct, as far as I know of, this is no regression. [ 6.989016] pci :d7:05.0: disabled boot interrupts on device [8086:2034] [ 6.996063] PCI: CLS 0 bytes, default 64 [ 7.08] Trying to unpack rootfs image as initramfs... [ 7.065281] Freeing initrd memory: 5136K The PCI resource assignment(?) also seems to take 670 ms: [ 6.319656] pci :04:00.0: can't claim BAR 6 [mem 0xfffc-0x pref]: no compatible bridge window […] [ 6.989016] pci :d7:05.0: disabled boot interrupts on device [8086:2034] […] [ 7.079098] DMAR: dmar7: Using Queued invalidation [ 7.083983] pci :00:00.0: Adding to iommu group 0 […] [ 8.537808] pci :d7:17.1: Adding to iommu group 141 I don't have this iommu stuff turned on and don't know what's happening here. There is a lock in `iommu_group_add_device()` in `drivers/iommu/iommu.c`: mutex_lock(&group->mutex); list_ad
[PATCH] iommu/amd: Fix typo in *glues … together* in comment
Signed-off-by: Paul Menzel --- drivers/iommu/amd/init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 1eacd43cb436..29d55a99c39f 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -1523,7 +1523,7 @@ static void amd_iommu_ats_write_check_workaround(struct amd_iommu *iommu) } /* - * This function clues the initialization function for one IOMMU + * This function glues the initialization function for one IOMMU * together and also allocates the command buffer and programs the * hardware. It does NOT enable the IOMMU. This is done afterwards. */ -- 2.34.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
nvme: IO_PAGE_FAULT logged with Intel SSDPEKKF512G8
Dear Linux folks, On a Dell OptiPlex 5055 with an Intel SSDPEKKF512G8, Linux 5.10.82 reported an IO_PAGE_FAULT error. This is the first and only time this has happened. $ dmesg --level=err [4.194306] nvme :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xc080 flags=0x0050] [4.206970] nvme :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xc000 flags=0x0050] [7.327820] kfd kfd: VERDE not supported in kfd $ lspci -nn -s 01:00.0 01:00.0 Non-Volatile memory controller [0108]: Intel Corporation SSD Pro 7600p/760p/E 6100p Series [8086:f1a6] (rev 03) $ sudo ./nvme list Node SN Model Namespace Usage Format FW Rev - - -- nvme0n1 BTHH82250YQK512D SSDPEKKF512G8 NVMe INTEL 512GB 1 512.11 GB / 512.11 GB512 B + 0 B D03N Please find the output of `dmesg` attached. Kind regards, Paul PS: Some more info: $ lspci -tvn -[:00]-+-00.0 1022:1450 +-00.2 1022:1451 +-01.0 1022:1452 +-01.1-[01]00.0 8086:f1a6 +-01.3-[02-05]--+-00.0 1022:43bb | +-00.1 1022:43b7 | \-00.2-[03-05]--+-00.0-[04]00.0 14e4:1687 | \-01.0-[05]-- +-02.0 1022:1452 +-03.0 1022:1452 +-03.1-[06]--+-00.0 1002:682b |\-00.1 1002:aab0 +-04.0 1022:1452 +-07.0 1022:1452 +-07.1-[07]--+-00.0 1022:145a |+-00.2 1022:1456 |\-00.3 1022:145c +-08.0 1022:1452 +-08.1-[08]--+-00.0 1022:1455 |+-00.2 1022:7901 |\-00.3 1022:1457 +-14.0 1022:790b +-14.3 1022:790e +-18.0 1022:1460 +-18.1 1022:1461 +-18.2 1022:1462 +-18.3 1022:1463 +-18.4 1022:1464 +-18.5 1022:1465 +-18.6 1022:1466 \-18.7 1022:1467[0.00] Linux version 5.10.82.mx64.414 (r...@invidia.molgen.mpg.de) (gcc (GCC) 7.5.0, GNU ld (GNU Binutils) 2.37) #1 SMP Mon Nov 29 14:15:19 CET 2021 [0.00] Command line: BOOT_IMAGE=/boot/bzImage.x86_64 root=LABEL=root ro crashkernel=64G-:256M console=ttyS0,115200n8 console=tty0 init=/bin/systemd audit=0 random.trust_cpu=on [0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [0.00] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format. [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x00087fff] usable [0.00] BIOS-e820: [mem 0x00088000-0x00088fff] reserved [0.00] BIOS-e820: [mem 0x00089000-0x0009efff] usable [0.00] BIOS-e820: [mem 0x0009f000-0x000b] reserved [0.00] BIOS-e820: [mem 0x0010-0x09cf] usable [0.00] BIOS-e820: [mem 0x09d0-0x09e6] reserved [0.00] BIOS-e820: [mem 0x09e7-0x7afb5fff] usable [0.00] BIOS-e820: [mem 0x7afb6000-0x7afb6fff] reserved [0.00] BIOS-e820: [mem 0x7afb7000-0x7afbbfff] usable [0.00] BIOS-e820: [mem 0x7afbc000-0x7afbcfff] reserved [0.00] BIOS-e820: [mem 0x7afbd000-0xdadbefff] usable [0.00] BIOS-e820: [mem 0xdadbf000-0xdafbefff] type 20 [0.00] BIOS-e820: [mem 0xdafbf000-0xdcfbefff] reserved [0.00] BIOS-e820: [mem 0xdcfbf000-0xdefbefff] ACPI NVS [0.00] BIOS-e820: [mem 0xdefbf000-0xdeffefff] ACPI data [0.00] BIOS-e820: [mem 0xdefff000-0xdeff] usable [0.00] BIOS-e820: [mem 0xdf00-0xdfff] reserved [0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved [0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved [0.00] BIOS-e820: [mem 0xfed8-0xfed80fff] reserved [0.00] BIOS-e820: [mem 0xff00-0x] reserved [0.00] BIOS-e8
Re: nvme: IO_PAGE_FAULT logged with Intel SSDPEKKF512G8
Dear Keith, Thank you for your quick response. Am 18.01.22 um 17:53 schrieb Keith Busch: On Tue, Jan 18, 2022 at 03:32:45PM +0100, Paul Menzel wrote: On a Dell OptiPlex 5055 with an Intel SSDPEKKF512G8, Linux 5.10.82 reported an IO_PAGE_FAULT error. This is the first and only time this has happened. $ dmesg --level=err [4.194306] nvme :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xc080 flags=0x0050] [4.206970] nvme :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xc000 flags=0x0050] [7.327820] kfd kfd: VERDE not supported in kfd $ lspci -nn -s 01:00.0 01:00.0 Non-Volatile memory controller [0108]: Intel Corporation SSD Pro 7600p/760p/E 6100p Series [8086:f1a6] (rev 03) I think it's a bug with the iommu implementation. That would surprise me, but I am adding Jörg and Suravee to the recipient list. Last time, I saw an IO_PAGE_FAULT, it was a bug in the amdgpu driver. If it causes problems, you can typically work around it with kernel parameter "iommu=soft". I have not noticed any problems yet. Kind regards, Paul PS: No idea, if useful, but I include the content of `/proc/iomem`: $ sudo more /proc/iomem -0fff : Reserved 1000-00087fff : System RAM 00088000-00088fff : Reserved 00089000-0009efff : System RAM 0009f000-000b : Reserved 000a-000b : PCI Bus :00 000c-000c3fff : PCI Bus :00 000c4000-000c7fff : PCI Bus :00 000c8000-000cbfff : PCI Bus :00 000cc000-000c : PCI Bus :00 000d-000d3fff : PCI Bus :00 000d4000-000d7fff : PCI Bus :00 000d8000-000dbfff : PCI Bus :00 000dc000-000d : PCI Bus :00 000e-000e3fff : PCI Bus :00 000e4000-000e7fff : PCI Bus :00 000e8000-000ebfff : PCI Bus :00 000ec000-000e : PCI Bus :00 000f-000f : System ROM 0010-09cf : System RAM 0500-05e03316 : Kernel code 0600-063a8fff : Kernel rodata 0640-06762eff : Kernel data 06d31000-06ff : Kernel bss 09d0-09e6 : Reserved 09e7-7afb5fff : System RAM 7afb6000-7afb6fff : Reserved 7afb7000-7afbbfff : System RAM 7afbc000-7afbcfff : Reserved 7afbd000-dadbefff : System RAM dadbf000-dafbefff : Unknown E820 type dafbf000-dcfbefff : Reserved dcfbf000-defbefff : ACPI Non-volatile Storage defbf000-deffefff : ACPI Tables defff000-deff : System RAM df00-dfff : Reserved e000-f7ff : PCI Bus :00 e000-efff : PCI Bus :06 e000-efff : :06:00.0 f000-f00f : PCI Bus :02 f000-f00f : PCI Bus :03 f000-f00f : PCI Bus :04 f000-f000 : :04:00.0 f000-f000 : tg3 f001-f001 : :04:00.0 f001-f001 : tg3 f002-f002 : :04:00.0 f002-f002 : tg3 f010-f01f : PCI Bus :08 f010-f0107fff : :08:00.3 f010-f0107fff : ICH HD audio f0108000-f0108fff : :08:00.2 f0108000-f0108fff : ahci f020-f04f : PCI Bus :07 f020-f02f : :07:00.3 f020-f02f : xhci-hcd f030-f03f : :07:00.2 f040-f0401fff : :07:00.2 f050-f05f : PCI Bus :06 f050-f053 : :06:00.0 f054-f0543fff : :06:00.1 f054-f0543fff : ICH HD audio f056-f057 : :06:00.0 f060-f06f : PCI Bus :02 f060-f061 : :02:00.1 f060-f061 : ahci f062-f0627fff : :02:00.0 f062-f0627fff : xhci-hcd f068-f06f : :02:00.1 f070-f07f : PCI Bus :01 f070-f0703fff : :01:00.0 f070-f0703fff : nvme f800-fbff : PCI MMCONFIG [bus 00-3f] f800-fbff : Reserved fc00-feaf : PCI Bus :00 fc00-fc07 : amd_iommu fdf0-fdff : pnp 00:00 fec0-fec00fff : Reserved fec0-fec003ff : IOAPIC 0 fec01000-fec013ff : IOAPIC 1 fec1-fec10fff : Reserved fec3-fec30fff : AMDIF030:00 fed0-fed003ff : HPET 0 fed0-fed003ff : PNP0103:00 fed4-fed44fff : MSFT0101:00 fed8-fed80fff : Reserved fed81500-fed818ff : AMDI0030:00 fee0-fee00fff : Local APIC fee0-fee00fff : pnp 00:00 ff00- : Reserved ff00- : pnp 00:03 1-81eff : System RAM 81f00-81fff : RAM buffer ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: MSI B350M MORTAR: `AMD-Vi: Unable to write to IOMMU perf counter.` and `pci 0000:00:00.2: can't derive routing for PCI INT A`
Dear Jörg, On 07/20/18 14:31, Jörg Rödel wrote: > On Tue, Jul 17, 2018 at 06:02:07PM +0200, Paul Menzel wrote: >> $ dmesg >> […] >> [0.145696] calling pci_iommu_init+0x0/0x3f @ 1 >> [0.145719] AMD-Vi: Unable to write to IOMMU perf counter. > > This is likely a firmware issue. Either the IVRS ACPI table is incorrect Please find the output from sudo iasl -d -p /dev/shm/ivrs.dsl /sys/firmware/acpi/tables/IVRS attached. [0.00] ACPI: IVRS 0x9D3D8588 D0 (v02 AMDAMD IVRS 0001 AMD ) > or the BIOS did not enable the performance counter feature in the IOMMU > hardware. Is it possible to check that from the OS side? > Are you running on the latest BIOS? Yes, I am even using a “beta“ one from [1]. DMI: MSI MS-7A37/B350M MORTAR (MS-7A37), BIOS 1.G1 05/17/2018 Kind regards, Paul [1]: http://msi-ftp.de:8080/main.html?download&weblink=99df4e98c25ca3dcf0f6e7f8366cc1c7&realfilename=7A37_1g1.zip /* * Intel ACPI Component Architecture * AML/ASL+ Disassembler version 20180629 (64-bit version) * Copyright (c) 2000 - 2018 Intel Corporation * * Disassembly of /sys/firmware/acpi/tables/IVRS, Mon Jul 23 12:07:58 2018 * * ACPI Data Table [IVRS] * * Format: [HexOffset DecimalOffset ByteLength] FieldName : FieldValue */ [000h 4]Signature : "IVRS"[I/O Virtualization Reporting Structure] [004h 0004 4] Table Length : 00D0 [008h 0008 1] Revision : 02 [009h 0009 1] Checksum : 7C [00Ah 0010 6] Oem ID : "AMD " [010h 0016 8] Oem Table ID : "AMD IVRS" [018h 0024 4] Oem Revision : 0001 [01Ch 0028 4] Asl Compiler ID : "AMD " [020h 0032 4]Asl Compiler Revision : [024h 0036 4] Virtualization Info : 00203041 [028h 0040 8] Reserved : [030h 0048 1]Subtable Type : 10 [Hardware Definition Block] [031h 0049 1]Flags : B0 [032h 0050 2] Length : 0048 [034h 0052 2] DeviceId : 0002 [036h 0054 2]Capability Offset : 0040 [038h 0056 8] Base Address : FEB8 [040h 0064 2]PCI Segment Group : [042h 0066 2] Virtualization Info : [044h 0068 4] Reserved : 80048F6E [048h 0072 1] Entry Type : 03 [049h 0073 2]Device ID : 0008 [04Bh 0075 1] Data Setting : 00 [04Ch 0076 1] Entry Type : 04 [04Dh 0077 2]Device ID : FFFE [04Fh 0079 1] Data Setting : 00 [050h 0080 1] Entry Type : 43 [051h 0081 2]Device ID : FF00 [053h 0083 1] Data Setting : 00 [054h 0084 1] Reserved : 00 [055h 0085 2]Source Used Device ID : 00A4 [057h 0087 1] Reserved : 00 [058h 0088 1] Entry Type : 04 [059h 0089 2]Device ID : [05Bh 0091 1] Data Setting : 00 [05Ch 0092 1] Entry Type : 00 [05Dh 0093 2]Device ID : [05Fh 0095 1] Data Setting : 00 [060h 0096 1] Entry Type : 48 [061h 0097 2]Device ID : [063h 0099 1] Data Setting : 00 [064h 0100 1] Handle : 00 [065h 0101 2]Source Used Device ID : 00A0 [067h 0103 1] Variety : 02 [068h 0104 1] Entry Type : 48 [069h 0105 2]Device ID : [06Bh 0107 1] Data Setting : D7 [06Ch 0108 1] Handle : 05 [06Dh 0109 2]Source Used Device ID : 00A0 [06Fh 0111 1] Variety : 01 [070h 0112 1] Entry Type : 48 [071h 0113 2]Device ID : [073h 0115 1] Data Setting : 00 [074h 0116 1] Handle : 06 [075h 0117 2]Source Used Device ID : 0001 [077h 0119 1] Variety : 01 [078h 0120 1]Subtable Type : 11 [Unknown Subtable Type] [079h 0121 1]Flags : B0 [07Ah 0122 2] Length : 0058 [07Ch 0124 2] DeviceId : 0002 Unknown IVRS subtable type 0x11 Raw Table Data: Length 208 (0xD0) : 49 56 52 53 D0 00 00 00 02 7C 41 4D 44 20 20 00 // IVRS.|AMD . 0010: 41 4D 44 20 49 56 52 53 01 00 00 00 41 4D 44 20 // AMD IVRSAMD 0020: 00 00 00 00 41 30 20 00 00 00 00 00 00 00 00 00 // A0 . 0030: 10 B0 48 00 02 00 40 00 00 00 B8 FE 00 00 00 00 // ..H...@...
Re: MSI B350M MORTAR: `AMD-Vi: Unable to write to IOMMU perf counter.` and `pci 0000:00:00.2: can't derive routing for PCI INT A`
Dear Jörg, On 07/27/18 10:27, Jörg Rödel wrote: > On Mon, Jul 23, 2018 at 12:09:37PM +0200, Paul Menzel wrote: >>> or the BIOS did not enable the performance counter feature in the IOMMU >>> hardware. >> >> Is it possible to check that from the OS side? > > It would be if we had the NB documentation, but I guess those details > are not publicly available. On the AMD site [1] I see the three family 0x17 documents below. 1. Open-Source Register Reference for AMD Family 17h Processors (PUB) 2. Processor Programming Reference (PPR) for AMD Family 17h Models 00h-0Fh Processors (PUB) 3. Software Optimization Guide for AMD Family 17h Processors (PUB) What documented is required? The BKDG? >>> Are you running on the latest BIOS? >> >> Yes, I am even using a “beta“ one from [1]. >> >> DMI: MSI MS-7A37/B350M MORTAR (MS-7A37), BIOS 1.G1 05/17/2018 > > I think the best you can do is to report this as a bug to your BIOS > vendor and hope they'll fix it. I contacted them with the problem description. Let’s see what the result will be. Kind regards, Paul [1]: https://developer.amd.com/resources/developer-guides-manuals/ smime.p7s Description: S/MIME Cryptographic Signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu