[Intel-gfx] [RFC 0/8] Introduce framework to forward asynchronous OA counter

2015-06-22 Thread sourab . gupta
From: Sourab Gupta Cc: Robert Bragg , Zhenyu Wang , Jon Bloomfield , Peter Zijlstra , Jabin Wu , Insoo Woo This patch series adds support for capturing OA counter snapshots at asynchronous points by inserting MI_REPORT_PERF_COUNT commands into CS, and forwarding these

[Intel-gfx] [RFC 4/8] drm/i915: Add mechanism for forwarding async OA counter snapshots through perf

2015-06-22 Thread sourab . gupta
From: Sourab Gupta This patch adds the mechanism for forwarding the asynchronous OA snapshots through the perf event interface. Each node of data collected is forwarded as a separate perf sample. A single snapshot will have two fields. First is the raw report and second field is a footer with

[Intel-gfx] [RFC 2/8] drm/i915: Introduce mode for asynchronous capture of OA counters

2015-06-22 Thread sourab . gupta
From: Sourab Gupta The perf event framework supports periodic capture of OA counter snapshots. The raw OA reports generated by HW are forwarded to userspace using perf apis. This patch looks to extend the perf pmu introduced earlier to support the capture of asynchronous OA snapshots (in

[Intel-gfx] [RFC 5/8] drm/i915: Wait for GPU to finish before event stop, in async OA counter mode

2015-06-22 Thread sourab . gupta
From: Sourab Gupta The mode of asynchronous OA counter snapshot collection would need insertion of MI_REPORT_PERF_COUNT commands into the ringbuffer. Therefore, during the stop event call, we need to wait for GPU to complete processing the last request for which MI_RPC command was inserted. We

[Intel-gfx] [RFC 6/8] drm/i915: Routines for inserting OA capture commands in the ringbuffer

2015-06-22 Thread sourab . gupta
From: Sourab Gupta This patch introduces the routines which insert commands for capturing OA snapshots, into the ringbuffer for RCS engine. The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA counters. The routines introduced in this patch can be called to insert these

[Intel-gfx] [RFC 8/8] drm/i915: Add perfTag support for OA counter reports

2015-06-22 Thread sourab . gupta
From: Sourab Gupta This patch enables collection of perfTag in the OA reports. PerfTag is a mechanism, whereby the reports collected are marked with a perfTag passed by userspace during the execbuffer call. This way the userspace can identify the reports collected with the particular

[Intel-gfx] [RFC 7/8] drm/i915: Add commands in ringbuf for OA snapshot capture across Batchbuffer boundaries

2015-06-22 Thread sourab . gupta
From: Sourab Gupta This patch inserts the commands in the ring for capturing OA snapshots across batchbuffer boundaries. The data generated thus, would be of Batchbuffer granularity. This data can be useful standalone for per batch buffer profiling purposes. The issue of counter wraparound for

[Intel-gfx] [RFC 0/7] Introduce framework for forwarding generic non-OA performance

2015-06-22 Thread sourab . gupta
From: Sourab Gupta Cc: Robert Bragg , Zhenyu Wang , Jon Bloomfield , Peter Zijlstra , Jabin Wu , Insoo Woo This patch series builds upon the initial patch set floated earlier which extends the periodic OA sampling framework and adds handling asynchronous OA counter data

[Intel-gfx] [RFC 4/7] drm/i915: Add mechanism for forwarding the data samples to userspace through Gen PMU perf interface

2015-06-22 Thread sourab . gupta
From: Sourab Gupta This patch adds the mechanism for forwarding the data snapshots through the Gen PMU perf event interface. In this particular case, the data type of timestamp data node introduced earlier is being forwarded through the interface. The samples will be forwarded in a workqueue

[Intel-gfx] [RFC 5/7] drm/i915: Wait for GPU to finish before event stop in Gen Perf PMU

2015-06-22 Thread sourab . gupta
From: Sourab Gupta To collect timestamps around any GPU workload, we need to insert commands to capture them into the ringbuffer. Therefore, during the stop event call, we need to wait for GPU to complete processing the last request for which these commands were inserted. We need to ensure this

[Intel-gfx] [RFC 1/7] drm/i915: Add a new PMU for handling non-OA counter data profiling requests

2015-06-22 Thread sourab . gupta
From: Sourab Gupta The current perf PMU driver is specific for collection of OA counter statistics (which may be done in a periodic or asynchronous way). Since this enables us (and limits us) to render ring, we have no means for collection of data pertaining to other rings. To overcome this

[Intel-gfx] [RFC 6/7] drm/i915: Add routines for inserting commands in the ringbuf for capturing timestamps

2015-06-22 Thread sourab . gupta
From: Sourab Gupta This patch adds the routines through which one can insert commands in the ringbuf for capturing timestamps. The routines to insert these commands can be called at appropriate places during workload execution. The snapshots thus captured for each batchbuffer are then forwarded

[Intel-gfx] [RFC 7/7] drm/i915: Add support for retrieving MMIO register values in Gen Perf PMU

2015-06-22 Thread sourab . gupta
From: Sourab Gupta This patch adds support for retrieving MMIO register values through Gen Perf PMU interface. Through this interface, now the userspace can request upto 8 MMIO register values to be dumped, alongwith the timestamp values which were dumped earlier across the batchbuffer

[Intel-gfx] [RFC 2/7] drm/i915: Register routines for Gen perf PMU driver

2015-06-22 Thread sourab . gupta
From: Sourab Gupta This patch registers the new PMU driver, whose purpose is to enable data collection of non-OA counter data for all the rings, in a generic way. The patch introduces routines for this PMU driver, which also include the allocation routines for the buffer for collecting the data

[Intel-gfx] [RFC 3/7] drm/i915: Introduce timestamp node for timestamp data collection

2015-06-22 Thread sourab . gupta
From: Sourab Gupta This patch introduces data structures for holding the timestamp data, which can then be forwarded to userspace using Gen Perf PMU. Each timestamp node will have the timestamp value, alongwith additional metadata information such as ctx_id, pid, ring. Signed-off-by: Sourab

[Intel-gfx] [RFC 0/8] Introduce framework to forward multi context OA snapshots

2015-07-15 Thread sourab . gupta
From: Sourab Gupta This is an updated patch series(changes list at end), which adds support for capturing OA counter snapshots for multiple contexts, by inserting MI_REPORT_PERF_COUNT commands into CS, and forwarding these snapshots to userspace using perf interface. This work is based on

[Intel-gfx] [RFC 1/8] drm/i915: Have globally unique context ids, as opposed to drm file specific

2015-07-15 Thread sourab . gupta
From: Sourab Gupta Currently the context ids are specific to a drm file instance, as opposed to being globally unique. There are some usecases, which may require globally unique context ids. For e.g. a system level GPU profiler tool may lean upon the context ids to associate the performance

[Intel-gfx] [RFC 4/8] drm/i915: Forward periodic and CS based OA reports sorted acc to timestamps

2015-07-15 Thread sourab . gupta
From: Sourab Gupta The periodic reports and the RCS based reports are collected in two separate buffers. While forwarding to userspace, these have to be sent to single perf event ringbuffer. From a userspace perspective, it is good to have the reports in the single buffer in order to their

[Intel-gfx] [RFC 5/8] drm/i915: Handle event stop and destroy for commands in flight

2015-07-15 Thread sourab . gupta
From: Sourab Gupta In the periodic OA sampling mode, the event stop would stop forwarding samples to userspace, and disables OA synchronously. The buffer is destroyed eventually in event destroy callback. But when we have in flight RPC commands scheduled on GPU (like in this case), the handling

[Intel-gfx] [RFC 7/8] drm/i915: Add support for having pid output with OA report

2015-07-15 Thread sourab . gupta
From: Sourab Gupta This patch introduces flags and adds support for having pid output with the OA reports generated through the RCS commands. When the userspace expresses its interest in listening to the pid through an oa_attr field during event init, the OA reports generated would have an

[Intel-gfx] [RFC 8/8] drm/i915: Add support to add execbuffer tags to OA counter reports

2015-07-15 Thread sourab . gupta
From: Sourab Gupta This patch enables userspace to specify tags (per workload), provided via execbuffer ioctl, which could be added to OA reports, to help associate reports with the corresponding workloads. There may be multiple stages within a single context, from a userspace perspective. An

[Intel-gfx] [RFC 2/8] drm/i915: Introduce mode for capture of multi ctx OA reports synchronized with RCS

2015-07-15 Thread sourab . gupta
From: Sourab Gupta This patch introduces a mode of capturing OA counter reports belonging to multiple contexts, which can be mapped back to individual contexts. The OA reports captured in this way are synchronized with Render command stream. There may be usecases wherein we need more than

[Intel-gfx] [RFC 3/8] drm/i915: Add mechanism for forwarding CS based OA counter snapshots through perf

2015-07-15 Thread sourab . gupta
From: Sourab Gupta This patch adds the mechanism for forwarding the CS based OA snapshots through the perf event interface. The OA snapshots will be captured in a gem buffer object. The metadata information (ctx_id right now) pertaining to snapshot is maintained in a list, which has offsets

[Intel-gfx] [RFC 6/8] drm/i915: Insert commands for capture of OA counters in the ring

2015-07-15 Thread sourab . gupta
From: Sourab Gupta This patch adds the routines which insert commands for capturing OA snapshots into the ringbuffer of RCS engine. The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA counters, which is inserted at BB boundaries. While inserting the commands, we keep a

[Intel-gfx] [RFC 0/8] Introduce framework for forwarding generic non-OA performance

2015-07-15 Thread sourab . gupta
From: Sourab Gupta This is an updated patch set (changes list at end), which builds upon the multi context OA patch set introduced earlier at: http://lists.freedesktop.org/archives/intel-gfx/2015-July/071697.html The OA unit, as such, is specific to render ring and can't cater to perfor

[Intel-gfx] [RFC 2/8] drm/i915: Add mechanism for forwarding the timestamp data through perf

2015-07-15 Thread sourab . gupta
From: Sourab Gupta This patch adds the mechanism for forwarding the timestamp data to userspace using the Gen PMU perf event interface. The timestamps will be captured in a gem buffer object. The metadata information (ctx_id right now) pertaining to snapshot is maintained in a list, whose each

[Intel-gfx] [RFC 1/8] drm/i915: Add a new PMU for handling non-OA counter data profiling requests

2015-07-15 Thread sourab . gupta
From: Sourab Gupta The current perf PMU driver is specific for collection of OA counter statistics (which may be done in a periodic or asynchronous way). Since this enables us (and limits us) to render ring, we have no means for collection of data pertaining to other rings. To overcome this

[Intel-gfx] [RFC 3/8] drm/i915: Handle event stop and destroy for GPU commands submitted

2015-07-15 Thread sourab . gupta
From: Sourab Gupta This patch handles the event stop and destroy callbacks taking into account the fact that there may be commands scheduled on GPU which may utilize the destination buffer. The event stop would just set the event state, and stop forwarding data to userspace. From userspace

[Intel-gfx] [RFC 4/8] drm/i915: Insert commands for capturing timestamps in the ring

2015-07-15 Thread sourab . gupta
From: Sourab Gupta This patch adds the routines through which one can insert commands in the ringbuf for capturing timestamps, which are used to insert these commands around the batchbuffer. While inserting the commands, we keep a reference of associated request. This will be released when we

[Intel-gfx] [RFC 8/8] drm/i915: Support for retrieving MMIO register values alongwith timestamps through perf

2015-07-15 Thread sourab . gupta
From: Sourab Gupta This patch adds support for retrieving MMIO register values alongwith timestamps and forwarding them to userspace through perf. The userspace can request upto 8 MMIO register values to be dumped. The addresses of upto 8 MMIO registers can be passed through perf attr config

[Intel-gfx] [RFC 5/8] drm/i915: Add support for forwarding ring id in sample metadata through perf

2015-07-15 Thread sourab . gupta
From: Sourab Gupta This patch introduces flags and adds support for having ring id output with the timestamp samples and forwarding them through perf. When the userspace expresses its interest in listening to the ring id through a gen pmu attr field during event init, the samples generated

[Intel-gfx] [RFC 6/8] drm/i915: Add support for forwarding pid in timestamp sample metadata through perf

2015-07-15 Thread sourab . gupta
From: Sourab Gupta This patch introduces flags and adds support for having pid output with the timestamp samples and forwarding them through perf. When the userspace expresses its interest in listening to the pid through a gen pmu attr field during event init, the samples generated would have

[Intel-gfx] [RFC 7/8] drm/i915: Add support for forwarding execbuffer tags in timestamp sample metadata

2015-07-15 Thread sourab . gupta
From: Sourab Gupta This patch enables userspace to specify tags (per workload), provided via execbuffer ioctl, which could be added to timestamps samples, to help associate samples with the corresponding workloads. There may be multiple stages within a single context, from a userspace

[Intel-gfx] [RFC 4/8] drm/i915: Forward periodic and CS based OA reports sorted acc to timestamps

2015-08-04 Thread sourab . gupta
From: Sourab Gupta The periodic reports and the RCS based reports are collected in two separate buffers. While forwarding to userspace, these have to be sent to single perf event ringbuffer. From a userspace perspective, it is good to have the reports in the single buffer in order to their

[Intel-gfx] [RFC 0/8] Introduce framework to forward multi context OA snapshots

2015-08-04 Thread sourab . gupta
From: Sourab Gupta This is the updated patch series(v3 - changes listed at end), which adds support for capturing OA counter snapshots for multiple contexts, by inserting MI_REPORT_PERF_COUNT commands into CS, and forwarding these snapshots to userspace using perf interface. This work is based

[Intel-gfx] [RFC 1/8] drm/i915: Introduce global id for contexts

2015-08-04 Thread sourab . gupta
From: Sourab Gupta The current context user handles are specific to drm file instance. There are some usecases, which may require a global id for the contexts. For e.g. a system level GPU profiler tool may lean upon the global context ids to associate the performance snapshots with individual

[Intel-gfx] [RFC 2/8] drm/i915: Introduce mode for capture of multi ctx OA reports synchronized with RCS

2015-08-04 Thread sourab . gupta
From: Sourab Gupta This patch introduces a mode of capturing OA counter reports belonging to multiple contexts, which can be mapped back to individual contexts. The OA reports captured in this way are synchronized with Render command stream. There may be usecases wherein we need more than

[Intel-gfx] [RFC 3/8] drm/i915: Add mechanism for forwarding CS based OA counter snapshots through perf

2015-08-04 Thread sourab . gupta
From: Sourab Gupta This patch adds the mechanism for forwarding the CS based OA snapshots through the perf event interface. The OA snapshots will be captured in a gem buffer object. The metadata information (ctx global id, as of now) pertaining to snapshot is maintained in a list, which has

[Intel-gfx] [RFC 6/8] drm/i915: Insert commands for capture of OA counters in the ring

2015-08-04 Thread sourab . gupta
From: Sourab Gupta This patch adds the routines which insert commands for capturing OA snapshots into the ringbuffer of RCS engine. The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA counters, which is inserted at BB boundaries. While inserting the commands, we keep a

[Intel-gfx] [RFC 7/8] drm/i915: Add support for having pid output with OA report

2015-08-04 Thread sourab . gupta
From: Sourab Gupta This patch introduces flags and adds support for having pid output with the OA reports generated through the RCS commands. When the userspace expresses its interest in listening to the pid through an oa_attr field during event init, the OA reports generated would have an

[Intel-gfx] [RFC 5/8] drm/i915: Handle event stop and destroy for commands in flight

2015-08-04 Thread sourab . gupta
From: Sourab Gupta In the periodic OA sampling mode, the event stop would stop forwarding samples to userspace, and disables OA synchronously. The buffer is destroyed eventually in event destroy callback. But when we have in flight RPC commands scheduled on GPU (like in this case), the handling

[Intel-gfx] [RFC 8/8] drm/i915: Add support to add execbuffer tags to OA counter reports

2015-08-04 Thread sourab . gupta
From: Sourab Gupta This patch enables userspace to specify tags (per workload), provided via execbuffer ioctl, which could be added to OA reports, to help associate reports with the corresponding workloads. There may be multiple stages within a single context, from a userspace perspective. An

[Intel-gfx] [RFC 6/8] drm/i915: Add support for forwarding pid in timestamp sample metadata through perf

2015-08-04 Thread sourab . gupta
From: Sourab Gupta This patch introduces flags and adds support for having pid output with the timestamp samples and forwarding them through perf. When the userspace expresses its interest in listening to the pid through a gen pmu attr field during event init, the samples generated would have

[Intel-gfx] [RFC 1/8] drm/i915: Add a new PMU for handling non-OA counter data profiling requests

2015-08-04 Thread sourab . gupta
From: Sourab Gupta The current perf PMU driver is specific for collection of OA counter statistics (which may be done in a periodic or asynchronous way). Since this enables us (and limits us) to render ring, we have no means for collection of data pertaining to other rings. To overcome this

[Intel-gfx] [RFC 0/8] Introduce framework for forwarding generic non-OA performance

2015-08-04 Thread sourab . gupta
From: Sourab Gupta This is an updated patch set (v3 - changes list at end), which builds upon the multi context OA patch set introduced earlier at: http://lists.freedesktop.org/archives/intel-gfx/2015-August/072949.html The OA unit, as such, is specific to render ring and can't cat

[Intel-gfx] [RFC 3/8] drm/i915: Handle event stop and destroy for GPU commands submitted

2015-08-04 Thread sourab . gupta
From: Sourab Gupta This patch handles the event stop and destroy callbacks taking into account the fact that there may be commands scheduled on GPU which may utilize the destination buffer. The event stop would just set the event state, and stop forwarding data to userspace. From userspace

[Intel-gfx] [RFC 4/8] drm/i915: Insert commands for capturing timestamps in the ring

2015-08-04 Thread sourab . gupta
From: Sourab Gupta This patch adds the routines through which one can insert commands in the ringbuf for capturing timestamps, which are used to insert these commands around the batchbuffer. While inserting the commands, we keep a reference of associated request. This will be released when we

[Intel-gfx] [RFC 2/8] drm/i915: Add mechanism for forwarding the timestamp data through perf

2015-08-04 Thread sourab . gupta
From: Sourab Gupta This patch adds the mechanism for forwarding the timestamp data to userspace using the Gen PMU perf event interface. The timestamps will be captured in a gem buffer object. The metadata information (ctx global id right now) pertaining to snapshot is maintained in a list

[Intel-gfx] [RFC 5/8] drm/i915: Add support for forwarding ring id in sample metadata through perf

2015-08-04 Thread sourab . gupta
From: Sourab Gupta This patch introduces flags and adds support for having ring id output with the timestamp samples and forwarding them through perf. When the userspace expresses its interest in listening to the ring id through a gen pmu attr field during event init, the samples generated

[Intel-gfx] [RFC 8/8] drm/i915: Support for retrieving MMIO register values alongwith timestamps through perf

2015-08-04 Thread sourab . gupta
From: Sourab Gupta This patch adds support for retrieving MMIO register values alongwith timestamps and forwarding them to userspace through perf. The userspace can request upto 8 MMIO register values to be dumped. The addresses of upto 8 MMIO registers can be passed through perf attr config

[Intel-gfx] [RFC 7/8] drm/i915: Add support for forwarding execbuffer tags in timestamp sample metadata

2015-08-04 Thread sourab . gupta
From: Sourab Gupta This patch enables userspace to specify tags (per workload), provided via execbuffer ioctl, which could be added to timestamps samples, to help associate samples with the corresponding workloads. There may be multiple stages within a single context, from a userspace

[Intel-gfx] [PATCH v2] drm/i915: Sysfs interface to get GFX shmem usage stats per process

2014-09-03 Thread sourab . gupta
From: Sourab Gupta Currently the Graphics Driver provides an interface through which one can get a snapshot of the overall Graphics memory consumption. Also there is an interface available, which provides information about the several memory related attributes of every single Graphics buffer

<    1   2   3