On Fri, Nov 10, 2023 at 07:22:31PM +0100, Michal Wajdeczko wrote:
> The Single Root I/O Virtualization (SR-IOV) extension to the PCI
> Express (PCIe) specification suite is supported starting from 12th
> generation of Intel Graphics processors.
> 
> This RFC aims to explain how do we want to add support for SR-IOV
> to the new Xe driver and to propose related additions to the sysfs.
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdec...@intel.com>
> Cc: Oded Gabbay <ogab...@kernel.org>
> Cc: Rodrigo Vivi <rodrigo.v...@intel.com>
> Cc: Joonas Lahtinen <joonas.lahti...@linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursu...@linux.intel.com>
> Cc: Daniel Vetter <dan...@ffwll.ch>
> ---
>  Documentation/gpu/rfc/index.rst             |   5 +
>  Documentation/gpu/rfc/sysfs-driver-xe-sriov | 501 ++++++++++++++++++++
>  Documentation/gpu/rfc/xe_sriov.rst          | 192 ++++++++
>  3 files changed, 698 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/sysfs-driver-xe-sriov
>  create mode 100644 Documentation/gpu/rfc/xe_sriov.rst
> 
> diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
> index e4f7b005138d..fc5bc447f30d 100644
> --- a/Documentation/gpu/rfc/index.rst
> +++ b/Documentation/gpu/rfc/index.rst
> @@ -35,3 +35,8 @@ host such documentation:
>  .. toctree::
>  
>     xe.rst
> +
> +.. toctree::
> +   :maxdepth: 1
> +
> +   xe_sriov.rst
> diff --git a/Documentation/gpu/rfc/sysfs-driver-xe-sriov 
> b/Documentation/gpu/rfc/sysfs-driver-xe-sriov
> new file mode 100644
> index 000000000000..77748204dd83
> --- /dev/null
> +++ b/Documentation/gpu/rfc/sysfs-driver-xe-sriov
> @@ -0,0 +1,501 @@
> +.. Documentation/ABI/testing/sysfs-driver-xe-sriov
> +..
> +.. Intel Xe driver ABI (SR-IOV extensions)
> +..
> +    The Single Root I/O Virtualization (SR-IOV) extension to
> +    the PCI Express (PCIe) specification suite is supported
> +    starting from 12th generation of Intel Graphics processors.
> +
> +    This document describes Xe driver specific additions.
> +
> +    For description of generic SR-IOV sysfs attributes see
> +    "Documentation/ABI/testing/sysfs-bus-pci" document.
> +
> +    /sys/bus/pci/drivers/xe/BDF/
> +    ├── sriov_auto_provisioning
> +    │   ├── admin_mode
> +    │   ├── enabled
> +    │   ├── reset_defaults
> +    │   ├── resources
> +    │   │   ├── default_contexts_quota
> +    │   │   ├── default_doorbells_quota
> +    │   │   ├── default_ggtt_quota
> +    │   │   └── default_lmem_quota
> +    │   ├── scheduling
> +    │   │   ├── default_exec_quantum_ms
> +    │   │   └── default_preempt_timeout_us
> +    │   └── monitoring
> +    │       ├── default_cat_error_count
> +    │       ├── default_doorbell_time_us
> +    │       ├── default_engine_reset_count
> +    │       ├── default_h2g_time_us
> +    │       ├── default_irq_time_us
> +    │       └── default_page_fault_count
> +
> +    /sys/bus/pci/drivers/xe/BDF/
> +    ├── sriov_extensions
> +    │   ├── monitoring_period_ms
> +    │   ├── strict_scheduling_enabled
> +    │   ├── pf
> +    │   │   ├── device -> ../../../BDF
> +    │   │   ├── priority
> +    │   │   ├── tile0
> +    │   │   │   ├── gt0
> +    │   │   │   │   ├── exec_quantum_ms
> +    │   │   │   │   ├── preempt_timeout_us
> +    │   │   │   │   └── thresholds
> +    │   │   │   │       ├── cat_error_count
> +    │   │   │   │       ├── doorbell_time_us
> +    │   │   │   │       ├── engine_reset_count
> +    │   │   │   │       ├── h2g_time_us
> +    │   │   │   │       ├── irq_time_us
> +    │   │   │   │       └── page_fault_count
> +    │   │   │   └── gtX
> +    │   │   └── tileT
> +    │   ├── vf1
> +    │   │   ├── device -> ../../../BDF+1
> +    │   │   ├── stop
> +    │   │   ├── tile0
> +    │   │   │   ├── ggtt_quota
> +    │   │   │   ├── lmem_quota
> +    │   │   │   ├── gt0
> +    │   │   │   │   ├── contexts_quota
> +    │   │   │   │   ├── doorbells_quota
> +    │   │   │   │   ├── exec_quantum_ms
> +    │   │   │   │   ├── preempt_timeout_us
> +    │   │   │   │   └── thresholds
> +    │   │   │   │       ├── cat_error_count
> +    │   │   │   │       ├── doorbell_time_us
> +    │   │   │   │       ├── engine_reset_count
> +    │   │   │   │       ├── h2g_time_us
> +    │   │   │   │       ├── irq_time_us
> +    │   │   │   │       └── page_fault_count
> +    │   │   │   └── gtX
> +    │   │   └── tileT
> +    │   └── vfN
> +..
> +
> +
> +What:                /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/
> +Date:                2024
> +KernelVersion:       TBD
> +Contact:     intel...@lists.freedesktop.org
> +Description:
> +             This directory appears on the device when:
> +
> +              - device supports SR-IOV, and
> +              - device is a Physical Function (PF), and
> +              - xe driver supports SR-IOV PF on given device, and
> +              - xe driver supports automatic VFs provisioning.
> +
> +             This directory is used as a root for all attributes related to
> +             automatic provisioning of SR-IOV Physical Function (PF) and/or
> +             Virtual Functions (VFs).
> +
> +
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/enabled
> +Date:                2024
> +KernelVersion:       TBD
> +Contact:     intel...@lists.freedesktop.org
> +Description:
> +             (RW) bool (0, 1)
> +
> +             This file represents configuration flag for the automatic VFs
> +             (un)provisioning that could be performed by the PF.
> +
> +             The default value is 1 (true).
> +
> +             This flag can be set to false, unless manual provisioning is not
> +             applicable for given platform or it is not supported by current
> +             PF implementation. In such cases -EPERM will be returned.
> +
> +             This flag will be automatically set to false when there will be
> +             other attempts to change any of VF's resource provisioning.
> +             See "sriov_extensions" section for details.
> +
> +             This flag can be set back to true if and only if all VFs are
> +             fully unprovisioned, otherwise -EEXIST error will be returned.
> +
> +             false = "disabled"
> +                     When disabled, then PF will not attempt to do automatic
> +                     VFs provisioning when VFs are being enabled and will not
> +                     perform automatic unprovisioning of the VFs when VFs 
> will
> +                     be disabled.
> +
> +             true = "enabled"
> +                     When enabled, then on VFs enabling PF will do automatic
> +                     VFs provisioning based on the default settings described
> +                     below.
> +
> +                     If automatic VFs provisioning fails due to some reasons,
> +                     then VFs will not be enabled.
> +
> +                     If enabled, all resources allocated during VFs enabling
> +                     will be released during VFs disabling (automatic 
> unprovisioning).
> +
> +
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/admin_mode
> +Date:                2024
> +KernelVersion:       TBD
> +Contact:     intel...@lists.freedesktop.org
> +Description:
> +             (RW) bool (0, 1)
> +
> +             This file represents configuration flag for the automatic VFs
> +             provisioning that could be performed by the PF.
> +
> +             The default value depends on the platform type.
> +
> +             This flag can be changed any time, but will have no effect if
> +             VFs are already provisioned.
> +
> +             If enabled (default on discrete platforms) then the PF will
> +             retain only minimum hardcoded resources for its own use when
> +             doing VFs automatic provisioning and will not use any default
> +             values described below for its own configuration.
> +
> +             If disabled (default on integrated platforms) then the PF will
> +             treat itself like yet another additional VF in all fair resource
> +             allocations and will also try to apply default provisioning
> +             values described below for its own configuration.
> +
> +
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/reset_defaults
> +Date:                2024
> +KernelVersion:       TBD
> +Contact:     intel...@lists.freedesktop.org
> +Description:
> +             (WO) bool (1)
> +
> +             Writing to this file will reset all default provisioning 
> parameters
> +             listed below to the default values.
> +
> +
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/resources/default_contexts_quota
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/resources/default_doorbells_quota
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/resources/default_ggtt_quota
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/resources/default_lmem_quota
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/scheduling/default_exec_quantum_ms
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/scheduling/default_preempt_timeout_us
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/monitoring/default_cat_error_count
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/monitoring/default_doorbell_time_us
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/monitoring/default_engine_reset_count
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/monitoring/default_h2g_time_us
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/monitoring/default_irq_time_us
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/monitoring/default_page_fault_count
> +Date:                2024
> +KernelVersion:       TBD
> +Contact:     intel...@lists.freedesktop.org
> +Description:
> +             These files represent default provisioning that should be used
> +             for VFs automatic provisioning.
> +
> +             These values can be changed any time, but will have no effect if
> +             VFs are already provisioned.
> +
> +             default_contexts_quota: (RW) integer 0..U32_MAX
> +                     The number of GuC context IDs to provide to the VF.
> +                     The default value is 0 (use fair allocations).
> +                     See "sriov_extensions/vfN/tileT/gtX/contexts_quota" for 
> details.
> +
> +             default_doorbells_quota: (RW) integer 0..U32_MAX
> +                     The number of GuC doorbells to provide to the VF.
> +                     The default value is 0 (use fair allocations).
> +                     See "sriov_extensions/vfN/tileT/gtX/doorbells_quota" 
> for details.
> +
> +             default_ggtt_quota: (RW) integer 0..U32_MAX
> +                     The size of the GGTT address space (in bytes) to 
> provide to the VF.
> +                     The default value is 0 (use fair allocations).
> +                     See "sriov_extensions/vfN/tileT/ggtt_quota" for details.
> +
> +             default_lmem_quota: (RW) integer 0..U32_MAX
> +                     The size of the LMEM (in bytes) to provide to the VF.
> +                     The default value is 0 (use fair allocations).
> +                     See "sriov_extensions/vfN/tileT/lmem_quota" for details.
> +
> +             default_exec_quantum_ms: (RW) integer 0..U32_MAX
> +                     The GT execution quantum (in millisecs) assigned to the 
> function.
> +                     The default value is 0 (infinify).
> +                     See "sriov_extensions/vfN/tileT/gtX/exec_quantum_ms" 
> for details.
> +
> +             default_preempt_timeout_us: (RW) integer 0..U32_MAX
> +                     The GT preemption timeout (in microsecs) assigned to 
> the function.
> +                     The default value is 0 (infinity).
> +                     See "sriov_extensions/vfN/tileT/gtX/preempt_timeout_us" 
> for details.
> +
> +             default_cat_error_count: (RW) integer 0..U32_MAX
> +             default_doorbell_time_us: (RW) integer 0..U32_MAX
> +             default_engine_reset_count: (RW) integer 0..U32_MAX
> +             default_h2g_time_us: (RW) integer 0..U32_MAX
> +             default_irq_time_us: (RW) integer 0..U32_MAX
> +             default_page_fault_count: (RW) integer 0..U32_MAX
> +                     The monitoring threshold to be set for the function.
> +                     The default value is 0 (don't monitor).
> +                     See "sriov_extensions/vfN/tileT/gtX/thresholds" for 
> details.
> +
> +
> +What:                /sys/bus/pci/drivers/xe/.../sriov_extensions/
> +Date:                2024
> +KernelVersion:       TBD
> +Contact:     intel...@lists.freedesktop.org
> +Description:
> +             This directory appears on Xe device when:
> +
> +              - device supports SR-IOV, and
> +              - device is a Physical Function (PF), and
> +              - driver is enabled to support SR-IOV PF on given device.
> +
> +             This directory is used as a root for all attributes required to
> +             manage both Physical Function (PF) and Virtual Functions (VFs).
> +
> +
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/strict_scheduling_enabled
> +Date:                2024
> +KernelVersion:       TBD
> +Contact:     intel...@lists.freedesktop.org
> +Description:
> +             (RW) bool
> +
> +             This file represents a flag used to determine if scheduling
> +             parameters should be respected even if there is no active
> +             workloads submitted by the PF or VFs.
> +
> +             This flag is disabled by default, unless strict scheduling is
> +             not applicable on given platform. In such case this file will
> +             be read-only.
> +
> +             The change to this file may have no effect if VFs are not yet 
> enabled.
> +             If strict scheduling can't be enabled in GuC then write will 
> fail with -EIO.
> +
> +
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/monitoring_period_ms
> +Date:                2024
> +KernelVersion:       TBD
> +Contact:     intel...@lists.freedesktop.org
> +Description:
> +             (RW) integer
> +
> +             This file represents the configuration knob used by adverse 
> event
> +             monitoring. A value here is the period in millisecs during which
> +             events are counted and the total is checked against a threshold.
> +             See "sriov_extensions/vfN/tileT/gtX/thresholds" for more 
> details.
> +
> +             Default is 0 (monitoring is disabled).
> +
> +             If monitoring capability is not available, then attempt to 
> enable
> +             will fail with -EPERM error. If monitoring can't be enabled in
> +             GuC then write will fail with -EIO.
> +
> +
> +What:                /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/
> +Date:                2024
> +KernelVersion:       TBD
> +Contact:     intel...@lists.freedesktop.org
> +Description:
> +             This directory holds all attributes related to the SR-IOV
> +             Physical Function (PF).
> +
> +
> +What:                /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/
> +Date:                2024
> +KernelVersion:       TBD
> +Contact:     intel...@lists.freedesktop.org
> +Description:
> +             This directory holds all attributes related to the SR-IOV
> +             Virtual Function (VF).
> +
> +             Note that VF numbers (N) are 1-based as described in PCI SR-IOV 
> specification.
> +             The Xe driver implementaton follows that naming schema.
> +
> +             There will be "vf1", "vf2" up to "vfN" directories, where N 
> matches
> +             value of the PCI "sriov_totalvfs" attribute.
> +
> +What:                /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/tileT/
> +What:                /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/
> +Date:                2024
> +KernelVersion:       TBD
> +Contact:     intel...@lists.freedesktop.org
> +Description:
> +             This directory holds all SR-IOV attributes related to the 
> device tile.
> +             The tile numbers (T) start from 0.
> +
> +             There is at least one "tile0/" directory present.
> +
> +
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/tileT/gtX
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/gtX
> +Date:                2024
> +KernelVersion:       TBD
> +Contact:     intel...@lists.freedesktop.org
> +Description:
> +             This directory holds all SR-IOV attributes related to the 
> device GT.
> +             The GT numbers (X) start from 0.
> +
> +             There is at least one "gt0/" directory present.
> +
> +
> +What:                /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/device
> +What:                /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/device
> +Date:                2024
> +KernelVersion:       TBD
> +Contact:     intel...@lists.freedesktop.org
> +Description:
> +             (symbolic link)
> +
> +             Backlink to the PCI device entry representing given function.
> +             For PF this link is always present.
> +             For VF this link is present only for currently enabled VFs.
> +
> +
> +What:                /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/priority
> +Date:                2024
> +KernelVersion:       TBD
> +Contact:     intel...@lists.freedesktop.org
> +Description:
> +             (RW) string
> +
> +             This file represents a GuC Scheduler knob to override the 
> default
> +             round-robin or FIFO scheduler policies implemented by the GuC.
> +
> +             The default value is "peer".
> +
> +             This flag can be changed, unless such change is not applicable
> +             for given platform or is not supported by current GuC firmware.
> +             In such case this file could be read-only or will return -EPERM
> +             on write attempt.
> +
> +             "immediate"
> +                     GuC will Schedule PF workloads immediately and PF
> +                     workloads only until the PF's work queues in GuC
> +                     are empty.
> +
> +             "lazy"
> +                     GuC will Schedule PF workloads at the next opportune
> +                     moment and PF workloads only until the PF work queues
> +                     in GuC are empty.
> +
> +             "peer"
> +                     GuC Scheduler will treat PF and VFs with equal priority.
> +
> +
> +What:                /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/stop
> +Date:                2024
> +KernelVersion:       TBD
> +Contact:     intel...@lists.freedesktop.org
> +Description:
> +             (WO) bool (1)
> +
> +             Write to this file will force GuC to stop handle any requests 
> from
> +             this VF, but without triggering a FLR.
> +             To recover, the full FLR must be issued using generic 
> "device/reset".
> +
> +             This file allows to implement custom policy mechanism when VF is
> +             misbehaving and triggering adverse events above defined 
> thresholds.
> +
> +
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/tileT/gtX/exec_quantum_ms
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/tileT/gtX/preempt_timeout_us
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/gtX/exec_quantum_ms
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/gtX/preempt_timeout_us
> +Date:                2024
> +KernelVersion:       TBD
> +Contact:     intel...@lists.freedesktop.org
> +Description:
> +             These files represent scheduling parameters of the functions.
> +
> +             These scheduling parameters can be changed even if VFs are 
> enabled
> +             and running, unless such change is not applicable on given 
> platform
> +             due to fixed hardware or firmware assignment.
> +
> +             exec_quantum_ms: (RW) integer 0..U32_MAX
> +                     The GT execution quantum in [ms] assigned to the 
> function.
> +                     Requested quantum might be aligned per HW/FW 
> requirements.
> +
> +                     Default is 0 (unlimited).
> +
> +             preempt_timeout_us: (RW) integer 0..U32_MAX
> +                     The GT preemption timeout in [us] assigned to the 
> function.
> +                     Requested timeout might be aligned per HW/FW 
> requirements.
> +
> +                     Default is 0 (unlimited).
> +
> +
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/ggtt_quota
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/lmem_quota
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/gtX/contexts_quota
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/gtX/doorbells_quota
> +Date:                2024
> +KernelVersion:       TBD
> +Contact:     intel...@lists.freedesktop.org
> +Description:
> +             These files represent shared resource assigned to the functions.
> +
> +             These resource parameters can be changed, unless VF is already 
> running,
> +             or such change is not applicable on given platform due to fixed 
> hardware
> +             or firmware assignment.
> +
> +             Writes to these attributes may fail with:
> +                     -EPERM if change is not applicable on give HW/FW.
> +                     -E2BIG if value larger that HW/FW limit.
> +                     -EDQUOT if value is larger than maximum quota defined 
> by the PF.
> +                     -ENOSPC if PF can't allocate required quota.
> +                     -EBUSY if the resource is currently in use by the VF.
> +                     -EIO if GuC refuses to change provisioning.
> +
> +             ggtt_quota: (RW) integer 0..U64_MAX
> +                     The size of the GGTT address space (in bytes) assigned 
> to the VF.
> +                     The value might be aligned per HW/FW requirements.
> +
> +                     Default is 0 (unprovisioned).
> +
> +             lmem_quota: (RW) integer 0..U64_MAX
> +                     The size of the Local Memory (in bytes) assigned to the 
> VF.
> +                     The value might be aligned per HW/FW requirements.
> +
> +                     This attribute is only available on discrete platforms.
> +
> +                     Default is 0 (unprovisioned).
> +
> +             contexts_quota: (RW) 0..U16_MAX
> +                     The number of GuC submission contexts assigned to the 
> VF.
> +                     This value might be aligned per HW/FW requirements.
> +
> +                     Default is 0 (unprovisioned).
> +
> +             doorbells_quota: (RW) 0..U16_MAX
> +                     The number of GuC doorbells assigned to the VF.
> +                     This value might be aligned per HW/FW requirements.
> +
> +                     Default is 0 (unprovisioned).
> +
> +
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/tileT/gtX/thresholds/cat_error_count
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/tileT/gtX/thresholds/doorbell_time_us
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/tileT/gtX/thresholds/engine_reset_count
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/tileT/gtX/thresholds/h2g_time_us
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/tileT/gtX/thresholds/irq_time_us
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/tileT/gtX/thresholds/page_fault_count
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/gtX/thresholds/cat_error_count
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/gtX/thresholds/doorbell_time_us
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/gtX/thresholds/engine_reset_count
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/gtX/thresholds/h2g_time_us
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/gtX/thresholds/irq_time_us
> +What:                
> /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/gtX/thresholds/page_fault_count
> +Date:                2024
> +KernelVersion:       TBD
> +Contact:     intel...@lists.freedesktop.org
> +Description:
> +             These files represent threshold values used by the GuC to 
> trigger
> +             security events if adverse event monitoring is enabled.
> +
> +             These thresholds are checked every "monitoring_period_ms".
> +             Refer to GuC ABI for details about each threshold category.
> +
> +             Default value for all thresholds is 0 (disabled).
> +
> +             cat_error_count: (RW) integer
> +             doorbell_time_us: (RW) integer
> +             engine_reset_count: (RW) integer
> +             h2g_time_us: (RW) integer
> +             irq_time_us: (RW) integer
> +             page_fault_count: (RW) integer
> diff --git a/Documentation/gpu/rfc/xe_sriov.rst 
> b/Documentation/gpu/rfc/xe_sriov.rst
> new file mode 100644
> index 000000000000..574f6414eabb
> --- /dev/null
> +++ b/Documentation/gpu/rfc/xe_sriov.rst
> @@ -0,0 +1,192 @@
> +.. SPDX-License-Identifier: MIT
> +
> +========================
> +Xe – SR-IOV Support Plan
> +========================
> +
> +The Single Root I/O Virtualization (SR-IOV) extension to the PCI Express 
> (PCIe)
> +specification suite is supported starting from 12th generation of Intel 
> Graphics
> +processors.
> +
> +This document describes planned ABI of the new Xe driver (see xe.rst) that 
> will
> +provide flexible configuration and management options related to the SR-IOV.
> +It will also highlight few most important changes to the Xe driver
> +implementation to deal with Intel GPU SR-IOV specific requirements.
> +
> +
> +SR-IOV Capability
> +=================
> +
> +Due to SR-IOV complexity and required co-operation between hardware, firmware
> +and kernel drivers, not all Xe architecture platforms might have SR-IOV 
> enabled
> +or fully functional.
> +
> +To control at the driver level which platform will provide support for 
> SR-IOV,
> +as we can't just rely on the PCI configuration data exposed by the hardware,
> +we will introduce "has_sriov" flag to the struct xe_device_desc that 
> describes
> +a device capabilities that driver checks during the probe.
> +
> +Initially this flag will be set to disabled even on platforms that we plan to
> +support. We will enable this flag only once we finish merging all required
> +changes to the driver and related validated firmwares are also made 
> available.
> +
> +
> +SR-IOV Platforms
> +================
> +
> +Initially we plan to add SR-IOV functionality to the following SDV platforms
> +already supported by the Xe driver:
> +
> + - TGL (up to 7 VFs)
> + - ADL (up to 7 VFs)
> + - MTL (up to 7 VFs)
> + - ATSM (up to 31 VFs)
> + - PVC (up to 63 VFs)
> +
> +Newer platforms will be supported later, but we hope that enabling will be
> +much faster, as majority of the driver changes are either platform agnostic
> +or are similar between earlier platforms (hence we start with SDVs).
> +
> +
> +PF Mode
> +=======
> +
> +Support in the driver for acting in Physical Function (PF) mode, i.e. mode
> +that allows configuration of VFs, depends on the CONFIG_PCI_IOV and will be
> +enabled by default.
> +
> +However, due to potentially conflicting requirements for SR-IOV and other 
> mega
> +features, we might want to have an option to disable SR-IOV PF mode support 
> at
> +the driver load time.

What about making SR-IOV support in Xe dependent on a separate build option, 
such
as CONFIG_DRM_XE_SRIOV? This would allow users to enable SR-IOV with 
CONFIG_PCI_IOV
to virtualize other devices, let's say a network adapter, but to keep this 
feature
compiled out of Xe.

Francois

> +
> +Thus, we plan to use additional modparam named "sriov_totalvfs" which if set 
> to
> +0 will force the driver to operate in the native (non-virtualized) mode.
> +The same modparam could be used to limit number of supported Virtual 
> Functions
> +(VFs) by the driver compared to the hardware limit exposed in PCI 
> configuration.
> +
> +The name of this modparam corresponds to the existing PCI sysfs attribute, 
> that
> +by default exposes hardware capability.
> +
> +The default value of this param will allow to support all possible VFs as
> +claimed by the hardware.
> +
> +This modparam will have no effect if driver is running on the VF device.
> +
> +
> +VFs Enabling
> +============
> +
> +To enable or disable VFs we plan to rely on existing sysfs attribute exposed 
> by
> +the PCI subsystem named "sriov_numvfs". We will provide all necessary tweaks 
> to
> +provision VFs in our custom implementation of the "sriov_configure" hook from
> +the struct pci_driver.
> +
> +If for some reason, including explicit request to disable SR-IOV PF mode 
> using
> +modparam, we will not be able to correctly support any VFs, driver will 
> change
> +number of supported VFs, exposed to the userspace by "sriov_totalvfs" 
> attribute,
> +to 0, thus preventing configuration of the VFs.
> +
> +
> +VF Mode
> +=======
> +
> +When driver is running on the VF device, then due to hardware enforcements,
> +access to the privileged registers is not possible. To avoid relying on these
> +registers, we plan to perform early detection if we are running on the VF
> +device using dedicated VF_CAP(0x1901f8) register and then use global macro
> +IS_SRIOV_VF(xe) to control the driver logic.
> +
> +To speed up merging of the required changes, we might first introduce dummy
> +macro that is always set to false, to prepare driver to avoid some code paths
> +before we finalize our VF mode detection and other VFs enabling changes.
> +
> +
> +Resources
> +=========
> +
> +Most of the hardware (or firmware) resources available on the Xe 
> architecture,
> +like GGTT, LMEM, GuC context IDs, GuC doorbells, will be shared between PF 
> and
> +VFs and will require some provisioning steps to assign those resources for 
> use
> +by the VF.
> +
> +Until VFs are provisioned with resources, the PF driver will be able to use 
> all
> +resources, in the same way as it would be running in non-virtualized mode.
> +
> +If some resource (of part or region of it) is assigned to specific VF, then 
> PF
> +is not allowed to use that part or region of the resource, but can continue 
> to
> +use whatever is left available.
> +
> +Those resources are usually fully virtualized, so they will not require any
> +special handling when used by the VF driver, except that VF driver must know
> +the assigned quota.
> +
> +The most notable exception is the GGTT address space, as on some platforms,
> +the VF driver must additionally know the real range that it can access.
> +
> +Once the resources were assigned to the VF use and the VF driver has started,
> +then it is not allowed to change such provisioning, as that would break the
> +VF driver. To make changes the VF driver, which was using these resources,
> +must be unloaded (or the VM is terminated) and the VF device must be reset
> +using the FLR.
> +
> +
> +Scheduling
> +==========
> +
> +The workloads from PF driver and VF drivers must be submitted to the hardware
> +always by using the GuC submission mechanism. Unless VF has exclusive access
> +to the GT then submissions from different VFs are time-sliced and controlled
> +with additional "execution_quantum" and "preemption_timeout" parameters.
> +
> +In contrast to the resource provisioning, those scheduling parameters can be
> +changed even if VF drivers are already running and are active.
> +
> +
> +Automatic VFs Provisioning
> +==========================
> +
> +To provide out-of-the box experience when user will be enabling VFs using
> +generic "sriov_numvfs" attribute without requiring complex provisioning 
> steps,
> +the SR-IOV PF driver will implement automatic VFs resource provisioning.
> +
> +By default, all VFs will be allocated with the fair amount of the mandatory
> +resources (like GGTT, GuC IDs) and with unrestricted scheduling parameters.
> +Such provisioning should be sufficient for most of the normal usages, when
> +no strict SLA is required.
> +
> +The PF driver will also expose some additional sysfs files to allow adjusting
> +this automatic VFs provisioning, like default values for most of the
> +provisioning parameters that PF will then apply for each enabled VF.
> +
> +    Details about those extension can be found in
> +    :download:`Preliminary Xe driver ABI <sysfs-driver-xe-sriov>`.
> +
> +
> +Manual VFs Provisioning
> +=======================
> +
> +If automatic VFs provisioning, which applies same configuration to every VF,
> +is not sufficient or there is a need for advanced customization of some VF,
> +the PF driver will also provide extended sysfs interface which will allow
> +control every provisioning attribute to the lowest feasible level.
> +
> +It is expected that these low-level attributes will be mostly used by the
> +advanced users or by the custom tools that will setup configurations that
> +meet predefined and validated SLA as required by the customers.
> +
> +    Details about those extension can be found in
> +    :download:`Preliminary Xe driver ABI <sysfs-driver-xe-sriov>`.
> +
> +
> +VFs Monitoring
> +==============
> +
> +In addition to the resource provisioning or changing scheduling parameters,
> +the PF driver might also allow configure some monitoring parameters, like
> +thresholds of adverse events or sample period, to track undesired behavior
> +of the VFs that could impact the whole system.
> +
> +Once those thresholds are setup and sampling period is defined, the GuC will
> +notify the PF driver about which VF is excessing the threshold and then PF is
> +able to trigger the uevent to notify the administrator (or VMM) that could
> +take some action against the VF.
> -- 
> 2.25.1
> 

Reply via email to