On Tue, Mar 25, 2025 at 08:14:13PM +0530, Cavitt, Jonathan wrote: > From: Jadav, Raag <raag.ja...@intel.com> > > On Tue, Mar 25, 2025 at 03:01:27AM +0530, Cavitt, Jonathan wrote: > > > From: Jadav, Raag <raag.ja...@intel.com> > > > > On Mon, Mar 24, 2025 at 10:27:08PM +0530, Cavitt, Jonathan wrote: > > > > > From: Jadav, Raag <raag.ja...@intel.com> > > > > > > On Thu, Mar 20, 2025 at 03:26:15PM +0000, Jonathan Cavitt wrote: > > > > > > > Add support for userspace to request a list of observed faults > > > > > > > from a specified VM. > > > > > > > > > > > > ... > > > > > > > > > > > > > +static int xe_vm_get_property_size(struct xe_vm *vm, u32 > > > > > > > property) > > > > > > > +{ > > > > > > > + int size = -EINVAL; > > > > > > > > > > > > Mixing size and error codes is usually received with mixed feelings. > > > > > > > > > > > > > + > > > > > > > + switch (property) { > > > > > > > + case DRM_XE_VM_GET_PROPERTY_FAULTS: > > > > > > > + spin_lock(&vm->faults.lock); > > > > > > > + size = vm->faults.len * sizeof(struct xe_vm_fault); > > > > > > > > > > > > size_mul() and, > > > > > > [1] perhaps fill it up into the pointer passed by the caller here? > > > > > > > > > > "The pointer passed by the caller". You mean the args pointer? > > > > > > > > > > We'd still need to check that the args->size value is empty here > > > > > before overwriting > > > > > it, and we'd also still need to return the size to the ioctl so we > > > > > can verify it's > > > > > acceptable later in xe_vm_get_property_verify_size. > > > > > > > > > > Unless you want to merge those two processes together into here? > > > > > > > > The semantics are a bit fuzzy to me. Why do we have a single ioctl for > > > > two different processes? Shouldn't they be handled separately? > > > > > > No. Sorry. Let me clarify. > > > "two different processes" = getting the size + verifying the size. > > > > Yes, which seems like they should be handlded with _FAULT_NUM and > > _FAULT_DATA ioctls but I guess we're way past it now. > > The current implementation mirrors xe_query. Should we have separate > queries for getting the size of the query data and getting the data itself > in xe_query?
Let's not break a well established API. > And just to preempt the question: this cannot be an xe_query because > the size of the returned data depends on the target VM, which cannot > be passed to the xe_query structure on the first pass when calculating > the size. And just reporting the maximum possible size was rejected > separately. Sure, makes sense. > > I'm also not much informed about the history here. Is there a real > > usecase behind exposing them? What is the user expected to do with > > this information? > > This is a request from Vulkan, and is necessary to satisfy the requirements > for one of their interfaces. Specifically, > https://registry.khronos.org/vulkan/specs/latest/man/html/VK_EXT_device_fault.html It says this should be a subsequence of device lost. What is the criteria for it wrt xe? A big enough fault will probably result in a coredump. So why not just reuse it? Raag