Hi Matthew and Thomas,
I'm exploring the use of drm_gpusvm for multi-GPU shared virtual memory
scenarios and have some questions about potential synchronization issues.
The drm_gpusvm design is per-device oriented, so for multi-GPU setups,
each GPU would have its own drm_gpusvm instance with independent MMU
notifiers registered to the same mm_struct.
When multiple drm_gpusvm instances share the same process address space,
I'm concerned about the following synchronization issues:
1. MMU notifier ordering: When CPU modifies memory (e.g., munmap),
multiple notifier callbacks are triggered independently. Is there any
guarantee on the ordering or atomicity across GPUs? Could this lead
to inconsistent states between GPUs?
2. Range state consistency: If GPU-A and GPU-B both have ranges
covering the same virtual address, and an invalidation occurs, how
should we ensure both GPUs see a consistent view before allowing
new GPU accesses?
3. Concurrent fault handling: If GPU-A and GPU-B fault on the same
address simultaneously, is there potential for races in
drm_gpusvm_range_find_or_insert()?
Is multi-GPU a considered use case for drm_gpusvm? If so, are there
recommended patterns for handling these coordination issues?
Regards,
Honglei