From: Sourab Gupta
Cc: Robert Bragg ,
Zhenyu Wang ,
Jon Bloomfield ,
Peter Zijlstra ,
Jabin Wu ,
Insoo Woo
This patch series adds support for capturing OA counter snapshots at
asynchronous points by inserting MI_REPORT_PERF_COUNT commands into CS,
and forwarding these
From: Sourab Gupta
This patch adds the mechanism for forwarding the asynchronous OA snapshots
through the perf event interface.
Each node of data collected is forwarded as a separate perf sample.
A single snapshot will have two fields. First is the raw report and second
field is a footer with
From: Sourab Gupta
The perf event framework supports periodic capture of OA counter snapshots. The
raw OA reports generated by HW are forwarded to userspace using perf apis. This
patch looks to extend the perf pmu introduced earlier to support the capture of
asynchronous OA snapshots (in
From: Sourab Gupta
The mode of asynchronous OA counter snapshot collection would need insertion
of MI_REPORT_PERF_COUNT commands into the ringbuffer. Therefore, during the
stop event call, we need to wait for GPU to complete processing the last
request for which MI_RPC command was inserted. We
From: Sourab Gupta
This patch introduces the routines which insert commands for capturing OA
snapshots, into the ringbuffer for RCS engine.
The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA
counters. The routines introduced in this patch can be called to insert these
From: Sourab Gupta
This patch enables collection of perfTag in the OA reports.
PerfTag is a mechanism, whereby the reports collected are marked with a
perfTag passed by userspace during the execbuffer call. This way the userspace
can identify the reports collected with the particular
From: Sourab Gupta
This patch inserts the commands in the ring for capturing OA snapshots across
batchbuffer boundaries. The data generated thus, would be of Batchbuffer
granularity. This data can be useful standalone for per batch buffer profiling
purposes. The issue of counter wraparound for
From: Sourab Gupta
Cc: Robert Bragg ,
Zhenyu Wang ,
Jon Bloomfield ,
Peter Zijlstra ,
Jabin Wu ,
Insoo Woo
This patch series builds upon the initial patch set floated earlier which
extends the periodic OA sampling framework and adds handling asynchronous OA
counter data
From: Sourab Gupta
This patch adds the mechanism for forwarding the data snapshots
through the Gen PMU perf event interface.
In this particular case, the data type of timestamp data node introduced
earlier is being forwarded through the interface.
The samples will be forwarded in a workqueue
From: Sourab Gupta
To collect timestamps around any GPU workload, we need to insert
commands to capture them into the ringbuffer. Therefore, during the stop event
call, we need to wait for GPU to complete processing the last request for
which these commands were inserted.
We need to ensure this
From: Sourab Gupta
The current perf PMU driver is specific for collection of OA counter statistics
(which may be done in a periodic or asynchronous way). Since this enables us
(and limits us) to render ring, we have no means for collection of data
pertaining to other rings.
To overcome this
From: Sourab Gupta
This patch adds the routines through which one can insert commands in the
ringbuf for capturing timestamps. The routines to insert these commands can be
called at appropriate places during workload execution.
The snapshots thus captured for each batchbuffer are then forwarded
From: Sourab Gupta
This patch adds support for retrieving MMIO register values through Gen Perf PMU
interface. Through this interface, now the userspace can request upto 8 MMIO
register values to be dumped, alongwith the timestamp values which were dumped
earlier across the batchbuffer
From: Sourab Gupta
This patch registers the new PMU driver, whose purpose is to enable data
collection of non-OA counter data for all the rings, in a generic way.
The patch introduces routines for this PMU driver, which also include the
allocation routines for the buffer for collecting the data
From: Sourab Gupta
This patch introduces data structures for holding the timestamp data, which
can then be forwarded to userspace using Gen Perf PMU.
Each timestamp node will have the timestamp value, alongwith additional metadata
information such as ctx_id, pid, ring.
Signed-off-by: Sourab
From: Sourab Gupta
This is an updated patch series(changes list at end), which adds support for
capturing OA counter snapshots for multiple contexts, by inserting
MI_REPORT_PERF_COUNT commands into CS, and forwarding these snapshots to
userspace using perf interface.
This work is based on
From: Sourab Gupta
Currently the context ids are specific to a drm file instance, as opposed
to being globally unique. There are some usecases, which may require
globally unique context ids. For e.g. a system level GPU profiler tool may
lean upon the context ids to associate the performance
From: Sourab Gupta
The periodic reports and the RCS based reports are collected in two
separate buffers. While forwarding to userspace, these have to be sent to
single perf event ringbuffer. From a userspace perspective, it is good to
have the reports in the single buffer in order to their
From: Sourab Gupta
In the periodic OA sampling mode, the event stop would stop forwarding
samples to userspace, and disables OA synchronously. The buffer is
destroyed eventually in event destroy callback. But when we have in flight
RPC commands scheduled on GPU (like in this case), the handling
From: Sourab Gupta
This patch introduces flags and adds support for having pid output with the
OA reports generated through the RCS commands.
When the userspace expresses its interest in listening to the pid through
an oa_attr field during event init, the OA reports generated would have an
From: Sourab Gupta
This patch enables userspace to specify tags (per workload), provided via
execbuffer ioctl, which could be added to OA reports, to help associate
reports with the corresponding workloads.
There may be multiple stages within a single context, from a userspace
perspective. An
From: Sourab Gupta
This patch introduces a mode of capturing OA counter reports belonging to
multiple contexts, which can be mapped back to individual contexts. The OA
reports captured in this way are synchronized with Render command stream.
There may be usecases wherein we need more than
From: Sourab Gupta
This patch adds the mechanism for forwarding the CS based OA snapshots
through the perf event interface.
The OA snapshots will be captured in a gem buffer object. The metadata
information (ctx_id right now) pertaining to snapshot is maintained in a
list, which has offsets
From: Sourab Gupta
This patch adds the routines which insert commands for capturing OA
snapshots into the ringbuffer of RCS engine.
The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA
counters, which is inserted at BB boundaries.
While inserting the commands, we keep a
From: Sourab Gupta
This is an updated patch set (changes list at end), which builds upon the
multi context OA patch set introduced earlier at:
http://lists.freedesktop.org/archives/intel-gfx/2015-July/071697.html
The OA unit, as such, is specific to render ring and can't cater to perfor
From: Sourab Gupta
This patch adds the mechanism for forwarding the timestamp data to
userspace using the Gen PMU perf event interface.
The timestamps will be captured in a gem buffer object. The metadata
information (ctx_id right now) pertaining to snapshot is maintained in a
list, whose each
From: Sourab Gupta
The current perf PMU driver is specific for collection of OA counter
statistics (which may be done in a periodic or asynchronous way). Since
this enables us (and limits us) to render ring, we have no means for
collection of data pertaining to other rings.
To overcome this
From: Sourab Gupta
This patch handles the event stop and destroy callbacks taking into account
the fact that there may be commands scheduled on GPU which may utilize the
destination buffer.
The event stop would just set the event state, and stop forwarding data to
userspace. From userspace
From: Sourab Gupta
This patch adds the routines through which one can insert commands in the
ringbuf for capturing timestamps, which are used to insert these commands
around the batchbuffer.
While inserting the commands, we keep a reference of associated request.
This will be released when we
From: Sourab Gupta
This patch adds support for retrieving MMIO register values alongwith
timestamps and forwarding them to userspace through perf.
The userspace can request upto 8 MMIO register values to be dumped.
The addresses of upto 8 MMIO registers can be passed through perf attr
config
From: Sourab Gupta
This patch introduces flags and adds support for having ring id output with
the timestamp samples and forwarding them through perf.
When the userspace expresses its interest in listening to the ring id
through a gen pmu attr field during event init, the samples generated
From: Sourab Gupta
This patch introduces flags and adds support for having pid output with the
timestamp samples and forwarding them through perf.
When the userspace expresses its interest in listening to the pid through a
gen pmu attr field during event init, the samples generated would have
From: Sourab Gupta
This patch enables userspace to specify tags (per workload), provided via
execbuffer ioctl, which could be added to timestamps samples, to help
associate samples with the corresponding workloads.
There may be multiple stages within a single context, from a userspace
From: Sourab Gupta
The periodic reports and the RCS based reports are collected in two
separate buffers. While forwarding to userspace, these have to be sent to
single perf event ringbuffer. From a userspace perspective, it is good to
have the reports in the single buffer in order to their
From: Sourab Gupta
This is the updated patch series(v3 - changes listed at end), which adds
support for capturing OA counter snapshots for multiple contexts, by inserting
MI_REPORT_PERF_COUNT commands into CS, and forwarding these snapshots to
userspace using perf interface.
This work is based
From: Sourab Gupta
The current context user handles are specific to drm file instance.
There are some usecases, which may require a global id for the contexts.
For e.g. a system level GPU profiler tool may lean upon the global context
ids to associate the performance snapshots with individual
From: Sourab Gupta
This patch introduces a mode of capturing OA counter reports belonging to
multiple contexts, which can be mapped back to individual contexts. The OA
reports captured in this way are synchronized with Render command stream.
There may be usecases wherein we need more than
From: Sourab Gupta
This patch adds the mechanism for forwarding the CS based OA snapshots
through the perf event interface.
The OA snapshots will be captured in a gem buffer object. The metadata
information (ctx global id, as of now) pertaining to snapshot is maintained
in a list, which has
From: Sourab Gupta
This patch adds the routines which insert commands for capturing OA
snapshots into the ringbuffer of RCS engine.
The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA
counters, which is inserted at BB boundaries.
While inserting the commands, we keep a
From: Sourab Gupta
This patch introduces flags and adds support for having pid output with the
OA reports generated through the RCS commands.
When the userspace expresses its interest in listening to the pid through
an oa_attr field during event init, the OA reports generated would have an
From: Sourab Gupta
In the periodic OA sampling mode, the event stop would stop forwarding
samples to userspace, and disables OA synchronously. The buffer is
destroyed eventually in event destroy callback. But when we have in flight
RPC commands scheduled on GPU (like in this case), the handling
From: Sourab Gupta
This patch enables userspace to specify tags (per workload), provided via
execbuffer ioctl, which could be added to OA reports, to help associate
reports with the corresponding workloads.
There may be multiple stages within a single context, from a userspace
perspective. An
From: Sourab Gupta
This patch introduces flags and adds support for having pid output with the
timestamp samples and forwarding them through perf.
When the userspace expresses its interest in listening to the pid through a
gen pmu attr field during event init, the samples generated would have
From: Sourab Gupta
The current perf PMU driver is specific for collection of OA counter
statistics (which may be done in a periodic or asynchronous way). Since
this enables us (and limits us) to render ring, we have no means for
collection of data pertaining to other rings.
To overcome this
From: Sourab Gupta
This is an updated patch set (v3 - changes list at end), which builds upon the
multi context OA patch set introduced earlier at:
http://lists.freedesktop.org/archives/intel-gfx/2015-August/072949.html
The OA unit, as such, is specific to render ring and can't cat
From: Sourab Gupta
This patch handles the event stop and destroy callbacks taking into account
the fact that there may be commands scheduled on GPU which may utilize the
destination buffer.
The event stop would just set the event state, and stop forwarding data to
userspace. From userspace
From: Sourab Gupta
This patch adds the routines through which one can insert commands in the
ringbuf for capturing timestamps, which are used to insert these commands
around the batchbuffer.
While inserting the commands, we keep a reference of associated request.
This will be released when we
From: Sourab Gupta
This patch adds the mechanism for forwarding the timestamp data to
userspace using the Gen PMU perf event interface.
The timestamps will be captured in a gem buffer object. The metadata
information (ctx global id right now) pertaining to snapshot is maintained
in a list
From: Sourab Gupta
This patch introduces flags and adds support for having ring id output with
the timestamp samples and forwarding them through perf.
When the userspace expresses its interest in listening to the ring id
through a gen pmu attr field during event init, the samples generated
From: Sourab Gupta
This patch adds support for retrieving MMIO register values alongwith
timestamps and forwarding them to userspace through perf.
The userspace can request upto 8 MMIO register values to be dumped.
The addresses of upto 8 MMIO registers can be passed through perf attr
config
From: Sourab Gupta
This patch enables userspace to specify tags (per workload), provided via
execbuffer ioctl, which could be added to timestamps samples, to help
associate samples with the corresponding workloads.
There may be multiple stages within a single context, from a userspace
From: Sourab Gupta
Currently the Graphics Driver provides an interface through which
one can get a snapshot of the overall Graphics memory consumption.
Also there is an interface available, which provides information
about the several memory related attributes of every single Graphics
buffer
201 - 252 of 252 matches
Mail list logo