RE: [PATCH v1 0/4] vfio: report NUMA nodes for device memory

Vikram Sethi Wed, 27 Sep 2023 08:28:49 -0700

> From: Alex Williamson <alex.william...@redhat.com>
> Sent: Wednesday, September 27, 2023 9:25 AM
> To: Jason Gunthorpe <j...@nvidia.com>
> Cc: Jonathan Cameron <jonathan.came...@huawei.com>; Ankit Agrawal
> <ank...@nvidia.com>; David Hildenbrand <da...@redhat.com>; Cédric Le
> Goater <c...@redhat.com>; shannon.zha...@gmail.com;
> peter.mayd...@linaro.org; a...@anisinha.ca; Aniket Agashe
> <anik...@nvidia.com>; Neo Jia <c...@nvidia.com>; Kirti Wankhede
> <kwankh...@nvidia.com>; Tarun Gupta (SW-GPU) <targu...@nvidia.com>;
> Vikram Sethi <vse...@nvidia.com>; Andy Currid <acur...@nvidia.com>;
> qemu-...@nongnu.org; qemu-devel@nongnu.org; Gavin Shan
> <gs...@redhat.com>; ira.we...@intel.com; navneet.si...@intel.com
> Subject: Re: [PATCH v1 0/4] vfio: report NUMA nodes for device memory
> 
> 
> On Wed, 27 Sep 2023 10:53:36 -0300
> Jason Gunthorpe <j...@nvidia.com> wrote:
> 
> > On Wed, Sep 27, 2023 at 12:33:18PM +0100, Jonathan Cameron wrote:
> >
> > > CXL accelerators / GPUs etc are a different question but who has one
> > > of those anyway? :)
> >
> > That's exactly what I mean when I say CXL will need it too. I keep
> > describing this current Grace & Hopper as pre-CXL HW. You can easially
> > imagine draping CXL around it. CXL doesn't solve the problem that
> > motivates all this hackying - Linux can't dynamically create NUMA
> > nodes.
> 
> Why is that and why aren't we pushing towards a solution of removing that
> barrier so that we don't require the machine topology to be configured to
> support this use case and guest OS limitations?  Thanks,
>


Even if Linux could create NUMA nodes dynamically for coherent CXL or CXL type 
devices, 
there is additional information FW knows that the kernel doesn't. For example,
what the distance/latency between CPU and the device NUMA node is. While CXL 
devices
present a CDAT table which gives latency type attributes within the device, 
there would still be some
guesswork needed in the kernel as to what the end to end latency/distance is. 
It's probably not the best outcome to just consider this generically far 
memory" because 
is it further than Intersocket memory access or not matters. 
Pre CXL devices such as for this patchset don't even have CDAT so the kernel by 
itself has
no idea if this latency/distance is less than or more than inter socket memory 
access latency
even.
So specially for devices present at boot, FW knows this information and should 
provide it. 
Similarly, QEMU should pass along this information to VMs for the best 
outcomes.  

Thanks
> Alex

RE: [PATCH v1 0/4] vfio: report NUMA nodes for device memory

Reply via email to