Launchpad has imported 14 comments from the remote bug at https://bugzilla.proxmox.com/show_bug.cgi?id=2969.
If you reply to an imported comment from within Launchpad, your comment will be sent to the remote bug automatically. Read more about Launchpad's inter-bugtracker facilities at https://help.launchpad.net/InterBugTracking. ------------------------------------------------------------------------ On 2020-08-28T06:29:13+00:00 Nick Bauer wrote: There exists a bug with Chelsio NICs that causes the following error: kvm: -device vfio- pci,host=0000:83:00.7,id=hostpci1.7,bus=pci.0,addr=0x11.7: vfio 0000:83:00.7: hardware reports invalid configuration, MSIX PBA outside of specified BAR This bug was fixed in later versions of Qemu, and is caused by vendor misconfigurations of their MSIX PBA. I know a catchall fix was implemented in recent versions of Qemu, as well as patches applied to hotfix it in earlier versions. I encountered this bug using a Chelsio T4 device, and I believe the patches are for T5 and newer. Here is an email chain that has a patch for this situation: https://patchwork.ozlabs.org/project/qemu-devel/patch/1435777545-32152-1-git-send-email-glau...@chelsio.com/ I'd appreciate it if anyone could tell me what the best course of action to fix it on my system would be. I assume the solution is to either build Qemu with this patch applied, or update the version of Qemu in my Proxmox installation, but I do not know which is the better route to go. Reply at: https://bugs.launchpad.net/qemu/+bug/1894869/comments/0 ------------------------------------------------------------------------ On 2020-08-31T08:33:54+00:00 Stefan wrote: The patch you mention is already included in our QEMU builds, but as you correctly said it's only implemented for T5 devices. You'd have to go about patching your QEMU yourself if you want this to work, or message the upstream QEMU maintainers to include a fix (or even better: provide them with the fix :) ). In any case, a full 'lspci -nnkvv' output for your device (and any virtual functions thereof) would help. I've attached a QEMU patch for you to try, it has "0xNNNN" instead of the actual device ID of your T4, so change that before applying the patch. No liability of this working at all, here be dragons and if it breaks everything you're on your own, but I believe it's simple enough to work, provided the hardware quirk is the same on T4 as on T5. You can find our QEMU downstream at https://git.proxmox.com/?p=pve- qemu.git;a=summary, if you put it in debian/patches/pve and mention the file in debian/patches/series you should be able to build a pve-qemu against it. Check out our developer documentation (https://pve.proxmox.com/wiki/Developer_Documentation) as well. Reply at: https://bugs.launchpad.net/qemu/+bug/1894869/comments/1 ------------------------------------------------------------------------ On 2020-08-31T08:34:21+00:00 Stefan wrote: Created attachment 614 experimental T4 patch, change 0xNNNN to device id Reply at: https://bugs.launchpad.net/qemu/+bug/1894869/comments/2 ------------------------------------------------------------------------ On 2020-08-31T22:14:41+00:00 Nick Bauer wrote: Created attachment 615 Full output of lspci -nnkvv Reply at: https://bugs.launchpad.net/qemu/+bug/1894869/comments/3 ------------------------------------------------------------------------ On 2020-08-31T22:15:51+00:00 Nick Bauer wrote: Created attachment 616 Output of lspci -nnkvv with Chelsio devices only Reply at: https://bugs.launchpad.net/qemu/+bug/1894869/comments/4 ------------------------------------------------------------------------ On 2020-09-01T00:57:02+00:00 Nick Bauer wrote: Thank you so much for your reply! I have attached the lspci you requested. I think the most recent version of qemu actually has a fix for all devices that give this error, as there were reports of some HBA cards also causing it. I would like to try applying your patch, however for several days now my builds of pve-qemu have been getting stopped by a missing dependency called libproxmox-backup-qemu0-dev. I have seen other people on the forums mention that it exists in the repository, but every time I git clone pve-qemu.git and attempt to build I get the same error. I thought it would be taken care of by mk-build-deps, but even that gets stopped by the same missing dependency. Apt install isn't able to find it either. Would you be able to tell me where I can find this dependency? Reply at: https://bugs.launchpad.net/qemu/+bug/1894869/comments/5 ------------------------------------------------------------------------ On 2020-09-01T07:55:48+00:00 Stefan wrote: You need to configure our PBS repository to get the library: # echo "deb http://download.proxmox.com/debian/pbs buster pbstest" >> /etc/apt/sources.list.d/pbs.list # apt update # apt install libproxmox-backup-qemu0-dev > I think the most recent version of qemu actually has a fix for all devices that give this error, as there were reports of some HBA cards also causing it. Hm, not sure about that, the patch I added is against our 5.1 build from the repo. That said, 5.1 is newer than what's currently rolled out, so you can also try just building the repo version without any patches and see if that fixes it. That would be nice, since 5.1 will be rolled out soon-ish anyway :) Reply at: https://bugs.launchpad.net/qemu/+bug/1894869/comments/6 ------------------------------------------------------------------------ On 2020-09-02T21:09:43+00:00 Nick Bauer wrote: I managed to get the package installed. Apparently my sources.list was set to jessie instead of buster. Fixing this allowed me to download that package, however make still fails, but with new errors. Progress! I'll attach the errors, but I understand if helping me fix this is outside of what you're willing to help me with. As a side note, the machine that I am configuring this on is not deployed, does not have a deadline for deployment, and has no data stored on it at all. As such, I'm willing to make just about any changes to it that you think might help, or that you may want to test. Reply at: https://bugs.launchpad.net/qemu/+bug/1894869/comments/7 ------------------------------------------------------------------------ On 2020-09-02T21:10:29+00:00 Nick Bauer wrote: Created attachment 618 New errors Reply at: https://bugs.launchpad.net/qemu/+bug/1894869/comments/8 ------------------------------------------------------------------------ On 2020-09-03T07:39:49+00:00 Stefan wrote: Hm, it appears your linker isn't finding the library. Try installing the 'libproxmox-backup-qemu0' package as well, that should have been a dependency of the -dev package though... Make sure /usr/lib/libproxmox_backup_qemu.so.0 exists. If you use "make deb" it also might be necessary to run the build as root. Reply at: https://bugs.launchpad.net/qemu/+bug/1894869/comments/9 ------------------------------------------------------------------------ On 2020-09-03T22:37:13+00:00 Nick Bauer wrote: I ran into problems building it with the patch applied. I know how to correct those errors, but I decided to check if I could build without the patches and found that the build fails for other reasons, too. I have attached the new errors. I have attached the new output. Just so that I understand it correctly, does the value that PCI_VENDOR_ID_CHELSIO stores equal 1425? Since I have two of the same Chelsio NIC installed, would that mean that I have to insert both 8100 and 8300 as my device IDs for my two cards in the patch, and have it evaluate whether they are equal to the value at vdev->device_id for the if statement the same way you did? Or should I just be bale to do it with a single device ID? Reply at: https://bugs.launchpad.net/qemu/+bug/1894869/comments/10 ------------------------------------------------------------------------ On 2020-09-03T22:38:37+00:00 Nick Bauer wrote: Created attachment 620 New errors given by make after installing libproxmox-backup-qemu0 Reply at: https://bugs.launchpad.net/qemu/+bug/1894869/comments/11 ------------------------------------------------------------------------ On 2020-09-07T09:38:59+00:00 Stefan wrote: There's no relevant error in the output you posted? You should have two files 'pve-qemu-kvm_5.1.0-1_amd64.deb' and 'pve-qemu-kvm- dbg_5.1.0-1_amd64.deb' in the repository root now, which you can install with 'apt install ./*.deb' or similar. If not, you might need a 'make clean' before the 'make deb'. > Just so that I understand it correctly, does the value that > PCI_VENDOR_ID_CHELSIO stores equal 1425? Since I have two of the same > Chelsio NIC installed, would that mean that I have to insert both 8100 and > 8300 as my device IDs for my two cards in the patch, and have it evaluate > whether they are equal to the value at vdev->device_id for the if statement > the same way you did? Or should I just be bale to do it with a single device > ID? # rg "PCI_VENDOR_ID_CHELSIO" include/hw/pci/pci_ids.h 219:#define PCI_VENDOR_ID_CHELSIO 0x1425 Yes. And also yes, if you need two different device IDs you need to add more clauses to the 'or', e.g.: ((vdev->device_id & 0xff00) == 0x5800 || (vdev->device_id & 0xff00) == 0x8100) || (vdev->device_id & 0xff00) == 0x8300)) { Reply at: https://bugs.launchpad.net/qemu/+bug/1894869/comments/12 ------------------------------------------------------------------------ On 2020-09-08T15:55:01+00:00 Nick Bauer wrote: Yes, you were right, I thought the warnings being set to evaluate as errors would stop the build, but I completely missed where it said it built the .deb packages. I got it built and installed this time, but I still get the same error when I attempt to boot a vm with the Chelsio cards. I have started a bug report with the upstream qemu devs. Reply at: https://bugs.launchpad.net/qemu/+bug/1894869/comments/15 ** Changed in: debian Status: Unknown => In Progress -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1894869 Title: Chelsio T4 has old MSIX PBA offset bug Status in QEMU: New Status in Debian: In Progress Bug description: There exists a bug with Chelsio NICs T4 that causes the following error: kvm: -device vfio- pci,host=0000:83:00.7,id=hostpci1.7,bus=pci.0,addr=0x11.7: vfio 0000:83:00.7: hardware reports invalid configuration, MSIX PBA outside of specified BAR I discovered this bug on a Proxmox system, and I was working with a downstream Proxmox developer to try to fix this issue. They provided me with the following change to make from line 1484 of hw/vfio/pci.c: static void vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp) * is 0x1000, so we hard code that here. */ if (vdev->vendor_id == PCI_VENDOR_ID_CHELSIO && - (vdev->device_id & 0xff00) == 0x5800) { + ((vdev->device_id & 0xff00) == 0x5800 || + (vdev->device_id & 0xff00) == 0x1425)) { msix->pba_offset = 0x1000; } else if (vdev->msix_relo == OFF_AUTOPCIBAR_OFF) { error_setg(errp, "hardware reports invalid configuration, " However, I found that this did not fix the issue, so the bug appears to work differently than the one that was present on the T5 NICs which has already been patched. I have attached the output of my lspci -nnkvv To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1894869/+subscriptions