-----Original Message----- From: Landwerlin, Lionel G <lionel.g.landwer...@intel.com> Sent: Monday, March 31, 2025 1:18 AM To: Cavitt, Jonathan <jonathan.cav...@intel.com>; intel...@lists.freedesktop.org Cc: Gupta, saurabhg <saurabhg.gu...@intel.com>; Zuo, Alex <alex....@intel.com>; joonas.lahti...@linux.intel.com; Brost, Matthew <matthew.br...@intel.com>; Zhang, Jianxun <jianxun.zh...@intel.com>; Lin, Shuicheng <shuicheng....@intel.com>; dri-devel@lists.freedesktop.org; Wajdeczko, Michal <michal.wajdec...@intel.com>; Mrozek, Michal <michal.mro...@intel.com>; Jadav, Raag <raag.ja...@intel.com>; Harrison, John C <john.c.harri...@intel.com> Subject: Re: [PATCH v15 0/6] drm/xe/xe_vm: Implement xe_vm_get_property_ioctl > > Hi Jonathan, > > Are the pagefault reported for any unit in the GPU (including command > streamer?) or is it limited to execution units?
Currently, the only faults that are reported are pagefaults that are handled by the XE pagefault handler (pf_queue_work_func), and that are reported on a userspace-visible engine class (I.E. not "reserved"). So, I think that means only execution unit pagefaults are visible? -Jonathan Cavitt > > Thanks, > > -Lionel > > On 28/03/2025 22:40, Jonathan Cavitt wrote: > > Add additional information to each VM so they can report up to the first > > 50 seen faults. Only pagefaults are saved this way currently, though in > > the future, all faults should be tracked by the VM for future reporting. > > > > Additionally, of the pagefaults reported, only failed pagefaults are > > saved this way, as successful pagefaults should recover silently and not > > need to be reported to userspace. > > > > To allow userspace to access these faults, a new ioctl - > > xe_vm_get_property_ioct - was created. > > > > v2: (Matt Brost) > > - Break full ban list request into a separate property. > > - Reformat drm_xe_vm_get_property struct. > > - Remove need for drm_xe_faults helper struct. > > - Separate data pointer and scalar return value in ioctl. > > - Get address type on pagefault report and save it to the pagefault. > > - Correctly reject writes to read-only VMAs. > > - Miscellaneous formatting fixes. > > > > v3: (Matt Brost) > > - Only allow querying of failed pagefaults > > > > v4: > > - Remove unnecessary size parameter from helper function, as it > > is a property of the arguments. (jcavitt) > > - Remove unnecessary copy_from_user (Jainxun) > > - Set address_precision to 1 (Jainxun) > > - Report max size instead of dynamic size for memory allocation > > purposes. Total memory usage is reported separately. > > > > v5: > > - Return int from xe_vm_get_property_size (Shuicheng) > > - Fix memory leak (Shuicheng) > > - Remove unnecessary size variable (jcavitt) > > > > v6: > > - Free vm after use (Shuicheng) > > - Compress pf copy logic (Shuicheng) > > - Update fault_unsuccessful before storing (Shuicheng) > > - Fix old struct name in comments (Shuicheng) > > - Keep first 50 pagefaults instead of last 50 (Jianxun) > > - Rename ioctl to xe_vm_get_faults_ioctl (jcavitt) > > > > v7: > > - Avoid unnecessary execution by checking MAX_PFS earlier (jcavitt) > > - Fix double-locking error (jcavitt) > > - Assert kmemdump is successful (Shuicheng) > > - Repair and move fill_faults break condition (Dan Carpenter) > > - Free vm after use (jcavitt) > > - Combine assertions (jcavitt) > > - Expand size check in xe_vm_get_faults_ioctl (jcavitt) > > - Remove return mask from fill_faults, as return is already -EFAULT or 0 > > (jcavitt) > > > > v8: > > - Revert back to using drm_xe_vm_get_property_ioctl > > - s/Migrate/Move (Michal) > > - s/xe_pagefault/xe_gt_pagefault (Michal) > > - Create new header file, xe_gt_pagefault_types.h (Michal) > > - Add and fix kernel docs (Michal) > > - Rename xe_vm.pfs to xe_vm.faults (jcavitt) > > - Store fault data and not pagefault in xe_vm faults list (jcavitt) > > - Store address, address type, and address precision per fault (jcavitt) > > - Store engine class and instance data per fault (Jianxun) > > - Properly handle kzalloc error (Michal W) > > - s/MAX_PFS/MAX_FAULTS_SAVED_PER_VM (Michal W) > > - Store fault level per fault (Micahl M) > > - Apply better copy_to_user logic (jcavitt) > > > > v9: > > - More kernel doc fixes (Michal W, Jianxun) > > - Better error handling (jcavitt) > > > > v10: > > - Convert enums to defines in regs folder (Michal W) > > - Move xe_guc_pagefault_desc to regs folder (Michal W) > > - Future-proof size logic for zero-size properties (jcavitt) > > - Replace address type extern with access type (Jianxun) > > - Add fault type to xe_drm_fault (Jianxun) > > > > v11: > > - Remove unnecessary switch case logic (Raag) > > - Compress size get, size validation, and property fill functions into a > > single helper function (jcavitt) > > - Assert valid size (jcavitt) > > - Store pagefaults in non-fault-mode VMs as well (Jianxun) > > > > v12: > > - Remove unnecessary else condition > > - Correct backwards helper function size logic (jcavitt) > > - Fix kernel docs and comments (Michal W) > > > > v13: > > - Move xe and user engine class mapping arrays to header (John H) > > > > v14: > > - Fix double locking issue (Jianxun) > > - Use size_t instead of int (Raag) > > - Remove unnecessary includes (jcavitt) > > > > v15: > > - Do not report faults from reserved engines (Jianxun) > > > > Signed-off-by: Jonathan Cavitt <joanthan.cav...@intel.com> > > Suggested-by: Joonas Lahtinen <joonas.lahti...@linux.intel.com> > > Suggested-by: Matthew Brost <matthew.br...@intel.com> > > Cc: Zhang Jianxun <jianxun.zh...@intel.com> > > Cc: Shuicheng Lin <shuicheng....@intel.com> > > Cc: Michal Wajdeczko <michal.wajdec...@intel.com> > > Cc: Michal Mrozek <michal.mro...@intel.com> > > Cc: Raag Jadav <raag.ja...@intel.com> > > Cc: John Harrison <john.c.harri...@intel.com> > > > > Jonathan Cavitt (6): > > drm/xe/xe_hw_engine: Map xe and user engine class in header > > drm/xe/xe_gt_pagefault: Disallow writes to read-only VMAs > > drm/xe/xe_gt_pagefault: Move pagefault struct to header > > drm/xe/uapi: Define drm_xe_vm_get_property > > drm/xe/xe_vm: Add per VM fault info > > drm/xe/xe_vm: Implement xe_vm_get_property_ioctl > > > > drivers/gpu/drm/xe/regs/xe_pagefault_desc.h | 50 ++++++ > > drivers/gpu/drm/xe/xe_device.c | 3 + > > drivers/gpu/drm/xe/xe_gt_pagefault.c | 72 ++++---- > > drivers/gpu/drm/xe/xe_gt_pagefault_types.h | 42 +++++ > > drivers/gpu/drm/xe/xe_guc_fwif.h | 28 ---- > > drivers/gpu/drm/xe/xe_hw_engine.c | 24 ++- > > drivers/gpu/drm/xe/xe_hw_engine_types.h | 3 + > > drivers/gpu/drm/xe/xe_query.c | 18 +- > > drivers/gpu/drm/xe/xe_vm.c | 177 ++++++++++++++++++++ > > drivers/gpu/drm/xe/xe_vm.h | 11 ++ > > drivers/gpu/drm/xe/xe_vm_types.h | 32 ++++ > > include/uapi/drm/xe_drm.h | 79 +++++++++ > > 12 files changed, 453 insertions(+), 86 deletions(-) > > create mode 100644 drivers/gpu/drm/xe/regs/xe_pagefault_desc.h > > create mode 100644 drivers/gpu/drm/xe/xe_gt_pagefault_types.h > > > >