The GuC-Error-Capture is currently reaching into xe_devcoredump structure to store its own place-holder snaphot-ptr to workaround the race between G2H-Error-Capture-Notification vs Drm-Scheduler triggering GuC-Submission-exec-queue-timeout/kill.
>From a subsystem layering perspective, this isn't scalable as GuC should not be manipulating contents of a global structure it does not own when responding to an unrelated thread / callstack. Also, part of the earlier mentioned workaround includes the GuC-Error-Capture taking on one of the front-end functions for xe_hw_engine_snapshot generation because of an orthogonal debugfs-caller requesting raw dumps of engine registers without a job. This request is better handled by GuC-Error-Capture since there is a lot to manage for reading and printing engine register lists and we want to avoid duplicate code or tables. However, logically speaking, the GuC-Error-Capture output node is really a subset of xe_hw_engine_snapshot. This is irregardless of the fact that the majority of an engine-snapshot is the register dumps that only the GuC-Error-Capture can do. That said, this series intends to refactor the plumbing between Guc-Error-Capture and xe_devcoredump (including xe_hw_engine_snapshot) to fix the layering for future maintenence and scalability. This is done without changing any functionality and IP-locality (i.e. GuC-Error-Capture still owns the single point of engine register list definition and printing). This series ensures 'xe_devcoredump_snapshot' owns 'xe_hw_engine_snapshot generation' and the latter owns 'xe_guc_capture_snapshot' retrieval (with GuC-Error-Capture as its helper). Alan Previn (6): drm/xe/guc: Rename __guc_capture_parsed_output drm/xe/guc: Don't store capture nodes in xe_devcoredump_snapshot drm/xe/guc: Split engine state print between xe_hw_engine vs xe_guc_capture drm/xe/guc: Move xe_hw_engine_snapshot creation back to xe_hw_engine.c drm/xe/xe_hw_engine: Update hw_engine_snapshot_capture for debugfs drm/xe/guc: Update comments on GuC-Err-Capture flows drivers/gpu/drm/xe/xe_devcoredump.c | 3 - drivers/gpu/drm/xe/xe_devcoredump_types.h | 6 - drivers/gpu/drm/xe/xe_guc_capture.c | 365 ++++++++---------- drivers/gpu/drm/xe/xe_guc_capture.h | 16 +- .../drm/xe/xe_guc_capture_snapshot_types.h | 53 +++ drivers/gpu/drm/xe/xe_guc_submit.c | 12 +- drivers/gpu/drm/xe/xe_hw_engine.c | 111 ++++-- drivers/gpu/drm/xe/xe_hw_engine.h | 4 +- drivers/gpu/drm/xe/xe_hw_engine_types.h | 13 +- 9 files changed, 319 insertions(+), 264 deletions(-) create mode 100644 drivers/gpu/drm/xe/xe_guc_capture_snapshot_types.h base-commit: 8b47c9cdb6a78364fe68f8af0abfd6f265577001 -- 2.34.1