Hello community experts, I am testing a passtrhough gpu performance by measuring device to host & host to device memory copy bandwidth. The tested GPU is nvidia t4. The benchmarking script I am using is https://developer.nvidia.com/blog/how-optimize-data-transfers-cuda-cc/.
On the baremetal machine, the result is [image: image.png] in the virtual machine, the result is, [image: image.png] My question is what could be the reason for the degradation and is there anything I can do to improve it? Thank you very much for the help. -- Best Regards, Jiatong Shen
