On Thu, Apr 24, 2025 at 02:38:05PM +0000, Jonathan Cavitt wrote: > Add additional information to each VM so they can report up to the first > 50 seen faults. Only pagefaults are saved this way currently, though in > the future, all faults should be tracked by the VM for future reporting. > > Additionally, of the pagefaults reported, only failed pagefaults are > saved this way, as successful pagefaults should recover silently and not > need to be reported to userspace. > > To allow userspace to access these faults, a new ioctl - > xe_vm_get_property_ioct - was created. > > v2: (Matt Brost) > - Break full ban list request into a separate property. > - Reformat drm_xe_vm_get_property struct. > - Remove need for drm_xe_faults helper struct. > - Separate data pointer and scalar return value in ioctl. > - Get address type on pagefault report and save it to the pagefault. > - Correctly reject writes to read-only VMAs. > - Miscellaneous formatting fixes. > > v3: (Matt Brost) > - Only allow querying of failed pagefaults > > v4: > - Remove unnecessary size parameter from helper function, as it > is a property of the arguments. (jcavitt) > - Remove unnecessary copy_from_user (Jainxun) > - Set address_precision to 1 (Jainxun) > - Report max size instead of dynamic size for memory allocation > purposes. Total memory usage is reported separately. > > v5: > - Return int from xe_vm_get_property_size (Shuicheng) > - Fix memory leak (Shuicheng) > - Remove unnecessary size variable (jcavitt) > > v6: > - Free vm after use (Shuicheng) > - Compress pf copy logic (Shuicheng) > - Update fault_unsuccessful before storing (Shuicheng) > - Fix old struct name in comments (Shuicheng) > - Keep first 50 pagefaults instead of last 50 (Jianxun) > - Rename ioctl to xe_vm_get_faults_ioctl (jcavitt) > > v7: > - Avoid unnecessary execution by checking MAX_PFS earlier (jcavitt) > - Fix double-locking error (jcavitt) > - Assert kmemdump is successful (Shuicheng) > - Repair and move fill_faults break condition (Dan Carpenter) > - Free vm after use (jcavitt) > - Combine assertions (jcavitt) > - Expand size check in xe_vm_get_faults_ioctl (jcavitt) > - Remove return mask from fill_faults, as return is already -EFAULT or 0 > (jcavitt) > > v8: > - Revert back to using drm_xe_vm_get_property_ioctl > - s/Migrate/Move (Michal) > - s/xe_pagefault/xe_gt_pagefault (Michal) > - Create new header file, xe_gt_pagefault_types.h (Michal) > - Add and fix kernel docs (Michal) > - Rename xe_vm.pfs to xe_vm.faults (jcavitt) > - Store fault data and not pagefault in xe_vm faults list (jcavitt) > - Store address, address type, and address precision per fault (jcavitt) > - Store engine class and instance data per fault (Jianxun) > - Properly handle kzalloc error (Michal W) > - s/MAX_PFS/MAX_FAULTS_SAVED_PER_VM (Michal W) > - Store fault level per fault (Micahl M) > - Apply better copy_to_user logic (jcavitt) > > v9: > - More kernel doc fixes (Michal W, Jianxun) > - Better error handling (jcavitt) > > v10: > - Convert enums to defines in regs folder (Michal W) > - Move xe_guc_pagefault_desc to regs folder (Michal W) > - Future-proof size logic for zero-size properties (jcavitt) > - Replace address type extern with access type (Jianxun) > - Add fault type to xe_drm_fault (Jianxun) > > v11: > - Remove unnecessary switch case logic (Raag) > - Compress size get, size validation, and property fill functions into a > single helper function (jcavitt) > - Assert valid size (jcavitt) > - Store pagefaults in non-fault-mode VMs as well (Jianxun) > > v12: > - Remove unnecessary else condition > - Correct backwards helper function size logic (jcavitt) > - Fix kernel docs and comments (Michal W) > > v13: > - Move xe and user engine class mapping arrays to header (John H) > > v14: > - Fix double locking issue (Jianxun) > - Use size_t instead of int (Raag) > - Remove unnecessary includes (jcavitt) > > v15: > - Do not report faults from reserved engines (Jianxun) > > v16: > - Remove engine class and instance (Ivan) > > v17: > - Map access type, fault type, and fault level to user macros (Matt > Brost, Ivan) > > v18: > - Add uAPI merge request to this cover letter > > v19: > - Perform kzalloc outside of lock (Auld) > > v20: > - Fix inconsistent use of whitespace in defines > > v21: > - Remove unnecessary size assertion (jcavitt) > > v22: > - Fix xe_vm_fault_entry kernel docs (Shuicheng) > > uAPI: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32987 > Signed-off-by: Jonathan Cavitt <joanthan.cav...@intel.com> > Suggested-by: Joonas Lahtinen <joonas.lahti...@linux.intel.com> > Suggested-by: Matthew Brost <matthew.br...@intel.com>
Scanned the series, had some nits on the last patch, but overall LGTM. Consider all patches: Acked-by: Matthew Brost <matthew.br...@intel.com> > Cc: Zhang Jianxun <jianxun.zh...@intel.com> > Cc: Shuicheng Lin <shuicheng....@intel.com> > Cc: Michal Wajdeczko <michal.wajdec...@intel.com> > Cc: Michal Mrozek <michal.mro...@intel.com> > Cc: Raag Jadav <raag.ja...@intel.com> > Cc: John Harrison <john.c.harri...@intel.com> > Cc: Ivan Briano <ivan.bri...@intel.com> > Cc: Matthew Auld <matthew.a...@intel.com> > > Jonathan Cavitt (5): > drm/xe/xe_gt_pagefault: Disallow writes to read-only VMAs > drm/xe/xe_gt_pagefault: Move pagefault struct to header > drm/xe/uapi: Define drm_xe_vm_get_property > drm/xe/xe_vm: Add per VM fault info > drm/xe/xe_vm: Implement xe_vm_get_property_ioctl > > drivers/gpu/drm/xe/regs/xe_pagefault_desc.h | 49 +++++ > drivers/gpu/drm/xe/xe_device.c | 3 + > drivers/gpu/drm/xe/xe_gt_pagefault.c | 72 ++++---- > drivers/gpu/drm/xe/xe_gt_pagefault_types.h | 42 +++++ > drivers/gpu/drm/xe/xe_guc_fwif.h | 28 --- > drivers/gpu/drm/xe/xe_vm.c | 195 ++++++++++++++++++++ > drivers/gpu/drm/xe/xe_vm.h | 11 ++ > drivers/gpu/drm/xe/xe_vm_types.h | 29 +++ > include/uapi/drm/xe_drm.h | 86 +++++++++ > 9 files changed, 454 insertions(+), 61 deletions(-) > create mode 100644 drivers/gpu/drm/xe/regs/xe_pagefault_desc.h > create mode 100644 drivers/gpu/drm/xe/xe_gt_pagefault_types.h > > -- > 2.43.0 >