Hi, I have been struggling for a few months now with achieving full bandwidth PCIe P2P in a qemu virtual machine. I am working with a number of PCIe endpoints (NVIDIA A100 GPUs and Mellanox ConnectX 7 Infiniband NICs) behind a PCIe switch. In all configurations I have tried, P2P traffic gets router back to the root complex. Does. Anyone have guidance on whether full bandwidth PCIe P2P is even supported by qemu?
Through my research, I have found have found two main approaches to solve this. [ 1 ] ATS came up frequently In my research. Unfortunetaly, I do not believe that all of my PCIe endpoints support the use of ATS for P2P traffic. at the very least, toggling the DirectTrans flag on my PCI switch didn’t have any affect on bandwidth on either the host or the guest. I’m think that this might be a dead end. [ 2 ] Another potential option is to disable ACS on the PCIe switch and pass all devices on the same switch to a virtual machine. Based on everything that I have read, this “should” work. When toggling the RequestRedir flag on the PCIe switch using “setpci”, P2P bandwidth increased and decreased as expected on the host. However, the P2P bandwidth did not increase or decrease in the guest. I can’t really explain the behavior that I am seeing in approach [ 2 ]. Should disabling the RequestRedir flag on a PCIe switch enable P2P traffic between different IOMMU groups? If not, why? Best, Thomas Disclaimer The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful. This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast, a leader in email security and cyber resilience. Mimecast integrates email defenses with brand protection, security awareness training, web security, compliance and other essential capabilities. Mimecast helps protect large and small organizations from malicious activity, human error and technology failure; and to lead the movement toward building a more resilient world. To find out more, visit our website.