Re: [Intel-gfx] [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events
On Tue, Dec 5, 2017 at 2:16 PM, Lionel Landwerlin < lionel.g.landwer...@intel.com> wrote: > Hey Sagar, > > Sorry for the delay looking into this series. > I've done some userspace/UI work in GPUTop to try to correlate perf > samples/tracepoints with i915 perf reports. > > I wanted to avoid having to add too much logic into the kernel and tried > to sample both cpu clocks & gpu timestamps from userspace. > So far that's not working. People more knowledgable than I would have > realized that the kernel can sneak in work into syscalls. > So result is that 2 syscalls (one to get the cpu clock, one for the gpu > timestamp) back to back from the same thread leads to time differences of > anywhere from a few microseconds to in some cases close to 1millisecond. So > it's basically unworkable. > Anyway the UI work won't go to waste :) > > I'm thinking to go with your approach. > From my experiment with gputop, it seems we might want to use a different > cpu clock source though or make it configurable. > The perf infrastructure allows you to choose what clock you want to use. > Since we want to avoid time adjustments on that clock (because we're adding > deltas), a clock monotonic raw would make most sense. > I would guess the most generally useful clock domain to correlate with the largest number of interesting events would surely be CLOCK_MONOTONIC, not _MONOTONIC_RAW. E.g. here's some discussion around why vblank events use CLOCK_MONOTINIC: https://lists.freedesktop.org/archives/dri-devel/2012-October/028878.html Br, - Robert > I'll look at adding some tests for this too. > > Thanks, > > - > Lionel > > On 15/11/17 12:13, Sagar Arun Kamble wrote: > >> We can compute system time corresponding to GPU timestamp by taking a >> reference point (CPU monotonic time, GPU timestamp) and then adding >> delta time computed using timecounter/cyclecounter support in kernel. >> We have to configure cyclecounter with the GPU timestamp frequency. >> Earlier approach that was based on cross-timestamp is not needed. It >> was being used to approximate the frequency based on invalid assumptions >> (possibly drift was being seen in the time due to precision issue). >> The precision of time from GPU clocks is already in ns and timecounter >> takes care of it as verified over variable durations. >> >> This series adds base timecounter/cyclecounter changes and changes to >> get GPU and CPU timestamps in OA samples. >> >> Sagar Arun Kamble (1): >>drm/i915/perf: Add support to correlate GPU timestamp with system time >> >> Sourab Gupta (3): >>drm/i915/perf: Add support for collecting 64 bit timestamps with OA >> reports >>drm/i915/perf: Extract raw GPU timestamps from OA reports >>drm/i915/perf: Send system clock monotonic time in perf samples >> >> drivers/gpu/drm/i915/i915_drv.h | 11 >> drivers/gpu/drm/i915/i915_perf.c | 124 ++ >> - >> drivers/gpu/drm/i915/i915_reg.h | 6 ++ >> include/uapi/drm/i915_drm.h | 14 + >> 4 files changed, 154 insertions(+), 1 deletion(-) >> >> > ___ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx > ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events
On Wed, Nov 15, 2017 at 12:13 PM, Sagar Arun Kamble < sagar.a.kam...@intel.com> wrote: > We can compute system time corresponding to GPU timestamp by taking a > reference point (CPU monotonic time, GPU timestamp) and then adding > delta time computed using timecounter/cyclecounter support in kernel. > We have to configure cyclecounter with the GPU timestamp frequency. > Earlier approach that was based on cross-timestamp is not needed. It > was being used to approximate the frequency based on invalid assumptions > (possibly drift was being seen in the time due to precision issue). > The precision of time from GPU clocks is already in ns and timecounter > takes care of it as verified over variable durations. > Hi Sagar, I have some doubts about this analysis... The intent behind Sourab's original approach was to be able to determine the frequency at runtime empirically because the constants we have aren't particularly accurate. Without a perfectly stable frequency that's known very precisely then an interpolated correlation will inevitably drift. I think the nature of HW implies we can't expect to have either of those. Then the general idea had been to try and use existing kernel infrastructure for a problem which isn't unique to GPU clocks. That's not to say that a more limited, simpler solution based on frequent re-correlation wouldn't be more than welcome if tracking an accurate frequency is too awkward for now, but I think some things need to be considered in that case: - It would be good to quantify the kind of drift seen in practice to know how frequently it's necessary to re-synchronize. It sounds like you've done this ("as verified over variable durations") so I'm curious what kind of drift you saw. I'd imagine you would see a significant drift over, say, one second and it might not take much longer for the drift to even become clearly visible to the user when plotted in a UI. For reference I once updated the arb_timer_query test in piglit to give some insight into this drift ( https://lists.freedesktop.org/archives/piglit/2016-September/020673.html) and at least from what I wrote back then it looks like I was seeing a drift of a few milliseconds per second on SKL. I vaguely recall it being much worse given the frequency constants we had for Haswell. - What guarantees will be promised about monotonicity of correlated system timestamps? Will it be guaranteed that sequential reports must have monotonically increasing timestamps? That might be fiddly if the gpu + system clock are periodically re-correlated, so it might be good to be clear in documentation that the correlation is best-effort only for the sake of implementation simplicity. That would still be good for a lot of UIs I think and there's freedom for the driver to start simple and potentially improve later by measuring the gpu clock frequency empirically. Currently only one correlated pair of timestamps is read when enabling the stream and so a relatively long time is likely to pass before the stream is disabled (seconds, minutes while a user is running a system profiler) . It seems very likely to me that these clocks are going to drift significantly without introducing some form of periodic re-synchronization based on some understanding of the drift that's seen. Br, - Robert > This series adds base timecounter/cyclecounter changes and changes to > get GPU and CPU timestamps in OA samples. > > Sagar Arun Kamble (1): > drm/i915/perf: Add support to correlate GPU timestamp with system time > > Sourab Gupta (3): > drm/i915/perf: Add support for collecting 64 bit timestamps with OA > reports > drm/i915/perf: Extract raw GPU timestamps from OA reports > drm/i915/perf: Send system clock monotonic time in perf samples > > drivers/gpu/drm/i915/i915_drv.h | 11 > drivers/gpu/drm/i915/i915_perf.c | 124 ++ > - > drivers/gpu/drm/i915/i915_reg.h | 6 ++ > include/uapi/drm/i915_drm.h | 14 + > 4 files changed, 154 insertions(+), 1 deletion(-) > > -- > 1.9.1 > > ___ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx > ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events
On Thu, Dec 7, 2017 at 12:48 AM, Robert Bragg wrote: > > at least from what I wrote back then it looks like I was seeing a drift of > a few milliseconds per second on SKL. I vaguely recall it being much worse > given the frequency constants we had for Haswell. > Sorry I didn't actually re-read my own message properly before referencing it :) Apparently the 2ms per second drift was for Haswell, so presumably not quite so bad for SKL. - Robert ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v3 09/11] drm/i915: add oa_event_min_timer_exponent sysctl
The minimal sampling period is now configurable via a dev.i915.oa_min_timer_exponent sysctl parameter. Following the precedent set by perf, the default is the minimum that won't (on its own) exceed the default kernel.perf_event_max_sample_rate default of 10 samples/s. Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_perf.c | 42 1 file changed, 30 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index c9e7104..18cb651 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -74,6 +74,23 @@ static u32 i915_perf_stream_paranoid = true; */ #define OA_EXPONENT_MAX 31 +/* for sysctl proc_dointvec_minmax of i915_oa_min_timer_exponent */ +static int zero; +static int oa_exponent_max = OA_EXPONENT_MAX; + +/* Theoretically we can program the OA unit to sample every 160ns but don't + * allow that by default unless root... + * + * The period is derived from the exponent as: + * + * period = 80ns * 2^(exponent + 1) + * + * Referring to perf's kernel.perf_event_max_sample_rate for a precedent + * (10 by default); with an OA exponent of 6 we get a period of 10.240 + * microseconds - just under 10Hz + */ +static u32 i915_oa_min_timer_exponent = 6; + /* XXX: beware if future OA HW adds new report formats that the current * code assumes all reports have a power-of-two size and ~(size - 1) can * be used as a mask to align the OA tail pointer. @@ -1266,21 +1283,13 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, return -EINVAL; } - /* NB: The exponent represents a period as follows: -* -* 80ns * 2^(period_exponent + 1) -* -* Theoretically we can program the OA unit to sample + /* Theoretically we can program the OA unit to sample * every 160ns but don't allow that by default unless * root. -* -* Referring to perf's -* kernel.perf_event_max_sample_rate for a precedent -* (10 by default); with an OA exponent of 6 we get -* a period of 10.240 microseconds -just under 10Hz */ - if (value < 6 && !capable(CAP_SYS_ADMIN)) { - DRM_ERROR("Sampling period too high without root privileges\n"); + if (value < i915_oa_min_timer_exponent && + !capable(CAP_SYS_ADMIN)) { + DRM_ERROR("OA timer exponent too low without root privileges\n"); return -EACCES; } @@ -1342,6 +1351,15 @@ static struct ctl_table oa_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, + { +.procname = "oa_min_timer_exponent", +.data = &i915_oa_min_timer_exponent, +.maxlen = sizeof(i915_oa_min_timer_exponent), +.mode = 0644, +.proc_handler = proc_dointvec_minmax, +.extra1 = &zero, +.extra2 = &oa_exponent_max, +}, {} }; -- 2.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v3 03/11] drm/i915: return EACCES for check_cmd() failures
check_cmd() is checking whether a command adheres to certain restrictions that ensure it's safe to execute within a privileged batch buffer. Returning false implies a privilege problem, not that the command is invalid. The distinction makes the difference between allowing the buffer to be executed as an unprivileged batch buffer or returning an EINVAL error to userspace without executing anything. In a case where userspace may want to test whether it can successfully write to a register that needs privileges the distinction may be important and an EINVAL error may be considered fatal. In particular this is currently true for Mesa, which includes a test for whether OACONTROL can be written too, but Mesa treats any error when flushing a batch buffer as fatal, calling exit(1). As it is currently Mesa can gracefully handle a failure to write to OACONTROL if the command parser is disabled, but if we were to remove OACONTROL from the parser's whitelist then the returned EINVAL would break Mesa applications as they attempt an OACONTROL write. Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_cmd_parser.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index cfe3e7a..71e778b 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -1261,7 +1261,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, if (!check_cmd(engine, desc, cmd, length, is_master, &oacontrol_set)) { - ret = -EINVAL; + ret = -EACCES; break; } -- 2.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v3 07/11] drm/i915: advertise available metrics via sysfs
Each metric set is given a sysfs entry like: /sys/class/drm/card0/metrics//id This allows userspace to enumerate the specific sets that are available for the current system. The 'id' file contains an unsigned integer that can be used to open the associated metric set via DRM_IOCTL_I915_PERF_OPEN. The is a globally unique ID for a specific OA unit register configuration that can be reliably used by userspace as a key to lookup corresponding counter meta data and normalization equations. The guid registry is currently maintained as part of gputop along with the XML metric set descriptions and code generation scripts, ref: https://github.com/rib/gputop > gputop-data/guids.xml > scripts/update-guids.py > gputop-data/oa-*.xml > scripts/i915-perf-kernelgen.py $ make -C gputop-data -f Makefile.xml SYSFS=1 WHITELIST=RenderBasic Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_drv.h| 2 ++ drivers/gpu/drm/i915/i915_oa_hsw.c | 45 ++ drivers/gpu/drm/i915/i915_oa_hsw.h | 4 drivers/gpu/drm/i915/i915_perf.c | 19 +++- 4 files changed, 69 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 48595c9..d5c7b70 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2114,6 +2114,8 @@ struct drm_i915_private { struct { bool initialized; + struct kobject *metrics_kobj; + struct mutex lock; struct list_head streams; diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c b/drivers/gpu/drm/i915/i915_oa_hsw.c index 3e6006ec..c32b5f8 100644 --- a/drivers/gpu/drm/i915/i915_oa_hsw.c +++ b/drivers/gpu/drm/i915/i915_oa_hsw.c @@ -24,6 +24,8 @@ * */ +#include + #include "i915_drv.h" enum metric_set_id { @@ -130,3 +132,46 @@ int i915_oa_select_metric_set_hsw(struct drm_i915_private *dev_priv) return -ENODEV; } } + +static ssize_t +show_render_basic_id(struct device *kdev, struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%d\n", METRIC_SET_ID_RENDER_BASIC); +} + +static struct device_attribute dev_attr_render_basic_id = { + .attr = { .name = "id", .mode = S_IRUGO }, + .show = show_render_basic_id, + .store = NULL, +}; + +static struct attribute *attrs_render_basic[] = { + &dev_attr_render_basic_id.attr, + NULL, +}; + +static struct attribute_group group_render_basic = { + .name = "403d8832-1a27-4aa6-a64e-f5389ce7b212", + .attrs = attrs_render_basic, +}; + +int +i915_perf_init_sysfs_hsw(struct drm_i915_private *dev_priv) +{ + int ret; + + ret = sysfs_create_group(dev_priv->perf.metrics_kobj, &group_render_basic); + if (ret) + goto error_render_basic; + + return 0; + +error_render_basic: + return ret; +} + +void +i915_perf_deinit_sysfs_hsw(struct drm_i915_private *dev_priv) +{ + sysfs_remove_group(dev_priv->perf.metrics_kobj, &group_render_basic); +} diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.h b/drivers/gpu/drm/i915/i915_oa_hsw.h index b618a1f..e4ba89d 100644 --- a/drivers/gpu/drm/i915/i915_oa_hsw.h +++ b/drivers/gpu/drm/i915/i915_oa_hsw.h @@ -31,4 +31,8 @@ extern int i915_oa_n_builtin_metric_sets_hsw; extern int i915_oa_select_metric_set_hsw(struct drm_i915_private *dev_priv); +extern int i915_perf_init_sysfs_hsw(struct drm_i915_private *dev_priv); + +extern void i915_perf_deinit_sysfs_hsw(struct drm_i915_private *dev_priv); + #endif diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 7c725b4..a943664 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1330,6 +1330,12 @@ void i915_perf_init(struct drm_i915_private *dev_priv) if (!IS_HASWELL(dev_priv)) return; + dev_priv->perf.metrics_kobj = + kobject_create_and_add("metrics", + &dev_priv->drm.primary->kdev->kobj); + if (!dev_priv->perf.metrics_kobj) + return; + hrtimer_init(&dev_priv->perf.oa.poll_check_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); dev_priv->perf.oa.poll_check_timer.function = oa_poll_check_timer_cb; @@ -1357,9 +1363,15 @@ void i915_perf_init(struct drm_i915_private *dev_priv) dev_priv->perf.oa.n_builtin_sets = i915_oa_n_builtin_metric_sets_hsw; - dev_priv->perf.oa.oa_formats = hsw_oa_formats; + if (i915_perf_init_sysfs_hsw(dev_priv)) { + kobject_put(dev_priv->perf.metrics_kobj); + dev_priv->perf.metrics_kobj = NULL; + return; + } dev_priv->perf.initialized = true; + + return; } void i915_perf_fini(struct drm_i915_pr
[Intel-gfx] [PATCH v3 02/11] drm/i915: rename OACONTROL GEN7_OACONTROL
OACONTROL changes quite a bit for gen8, with some bits split out into a per-context OACTXCONTROL register. Rename now before adding more gen7 OA registers Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_cmd_parser.c | 4 ++-- drivers/gpu/drm/i915/i915_reg.h| 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index 1db829c..cfe3e7a 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -446,7 +446,7 @@ static const struct drm_i915_reg_descriptor gen7_render_regs[] = { REG64(PS_INVOCATION_COUNT), REG64(PS_DEPTH_COUNT), REG64_IDX(RING_TIMESTAMP, RENDER_RING_BASE), - REG32(OACONTROL), /* Only allowed for LRI and SRM. See below. */ + REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below. */ REG64(MI_PREDICATE_SRC0), REG64(MI_PREDICATE_SRC1), REG32(GEN7_3DPRIM_END_OFFSET), @@ -1097,7 +1097,7 @@ static bool check_cmd(const struct intel_engine_cs *engine, * to the register. Hence, limit OACONTROL writes to * only MI_LOAD_REGISTER_IMM commands. */ - if (reg_addr == i915_mmio_reg_offset(OACONTROL)) { + if (reg_addr == i915_mmio_reg_offset(GEN7_OACONTROL)) { if (desc->cmd.value == MI_LOAD_REGISTER_MEM) { DRM_DEBUG_DRIVER("CMD: Rejected LRM to OACONTROL\n"); return false; diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 9397dde..ba91eff 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -616,7 +616,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define HSW_CS_GPR(n) _MMIO(0x2600 + (n) * 8) #define HSW_CS_GPR_UDW(n) _MMIO(0x2600 + (n) * 8 + 4) -#define OACONTROL _MMIO(0x2360) +#define GEN7_OACONTROL _MMIO(0x2360) #define _GEN7_PIPEA_DE_LOAD_SL 0x70068 #define _GEN7_PIPEB_DE_LOAD_SL 0x71068 -- 2.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v3 00/11] Enable Gen 7 Observation Architecture
Mostly just a rebase on a more recent nightly, except for an update to how POLLIN events are reported to reduce the CPU overhead that was otherwise seen while running gputop. The problem seen with poll was largely a fault with gputop having multiple redundant 200ms, but out-of-phase, timers tracked in its mainloop resulting in excessive poll wake ups due to timeouts. This was fixed, but still it highlighted that the per-fd code that runs after a poll wait wakes (regardless of the cause of the wake up) can easily become hot and i915 mmio reads here can quickly jump to the top of a cpu profile. The main value of the i915-perf poll implementation is for throttling more than for having a quick notification of new data, so it works nicely to only rely on the hrtimer callback that's polling for new data to be the thing that flags POLLIN events and check for that after the wait without any mmio. Just to plug gputop as a tool for visualising these metrics with, I'm now automatically publishing a demo of the interface via Travis to http://gputop.github.io which is also usable with local targets if you point a browser at E.g.: http://gputop.github.io?target=localhost while you have the gputop server running. Apologies that currently the demo site on its own (not connected to real hardware) doesn't show much since it doesn't have any fake metric values to graph, though it is possible to at least browse the different platform metric sets and selecting individual counters will show the description + normalization equation. Having the web UI hosted on github hopefully lowers the bar to trying it out since it avoids needing to set up Emscripten first as a build dependency. Regards, - Robert Robert Bragg (11): drm/i915: Add i915 perf infrastructure drm/i915: rename OACONTROL GEN7_OACONTROL drm/i915: return EACCES for check_cmd() failures drm/i915: don't whitelist oacontrol in cmd parser drm/i915: Add 'render basic' Haswell OA unit config drm/i915: Enable i915 perf stream for Haswell OA unit drm/i915: advertise available metrics via sysfs drm/i915: Add dev.i915.perf_event_paranoid sysctl option drm/i915: add oa_event_min_timer_exponent sysctl drm/i915: Add more Haswell OA metric sets drm/i915: Add a kerneldoc summary for i915_perf.c drivers/gpu/drm/i915/Makefile |4 + drivers/gpu/drm/i915/i915_cmd_parser.c | 40 +- drivers/gpu/drm/i915/i915_drv.c |6 + drivers/gpu/drm/i915/i915_drv.h | 157 +++ drivers/gpu/drm/i915/i915_gem_context.c | 22 +- drivers/gpu/drm/i915/i915_oa_hsw.c | 659 + drivers/gpu/drm/i915/i915_oa_hsw.h | 38 + drivers/gpu/drm/i915/i915_perf.c| 1615 +++ drivers/gpu/drm/i915/i915_reg.h | 340 ++- drivers/gpu/drm/i915/intel_ringbuffer.c |7 +- include/uapi/drm/i915_drm.h | 133 +++ 11 files changed, 2979 insertions(+), 42 deletions(-) create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.c create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.h create mode 100644 drivers/gpu/drm/i915/i915_perf.c -- 2.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v3 01/11] drm/i915: Add i915 perf infrastructure
Adds base i915 perf infrastructure for Gen performance metrics. This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64 properties to configure a stream of metrics and returns a new fd usable with standard VFS system calls including read() to read typed and sized records; ioctl() to enable or disable capture and poll() to wait for data. A stream is opened something like: uint64_t properties[] = { /* Single context sampling */ DRM_I915_PERF_PROP_CTX_HANDLE,ctx_handle, /* Include OA reports in samples */ DRM_I915_PERF_PROP_SAMPLE_OA, true, /* OA unit configuration */ DRM_I915_PERF_PROP_OA_METRICS_SET,metrics_set_id, DRM_I915_PERF_PROP_OA_FORMAT, report_format, DRM_I915_PERF_PROP_OA_EXPONENT, period_exponent, }; struct drm_i915_perf_open_param parm = { .flags = I915_PERF_FLAG_FD_CLOEXEC | I915_PERF_FLAG_FD_NONBLOCK | I915_PERF_FLAG_DISABLED, .properties_ptr = (uint64_t)properties, .num_properties = sizeof(properties) / 16, }; int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, ¶m); Records read all start with a common { type, size } header with DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records contain an extensible number of fields and it's the DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that determine what's included in every sample. No specific streams are supported yet so any attempt to open a stream will return an error. Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/Makefile| 3 + drivers/gpu/drm/i915/i915_drv.c | 6 + drivers/gpu/drm/i915/i915_drv.h | 92 + drivers/gpu/drm/i915/i915_perf.c | 430 +++ include/uapi/drm/i915_drm.h | 67 ++ 5 files changed, 598 insertions(+) create mode 100644 drivers/gpu/drm/i915/i915_perf.c diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 6092f0e..9a2f1f9 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -106,6 +106,9 @@ i915-y += dvo_ch7017.o \ # virtual gpu code i915-y += i915_vgpu.o +# perf code +i915-y += i915_perf.o + ifeq ($(CONFIG_DRM_I915_GVT),y) i915-y += intel_gvt.o include $(src)/gvt/Makefile diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 83afdd0..f5209ff 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -1169,6 +1169,9 @@ static void i915_driver_register(struct drm_i915_private *dev_priv) * cannot run before the connectors are registered. */ intel_fbdev_initial_config_async(dev); + + /* Depends on sysfs having been initialized */ + i915_perf_init(dev_priv); } /** @@ -1177,6 +1180,8 @@ static void i915_driver_register(struct drm_i915_private *dev_priv) */ static void i915_driver_unregister(struct drm_i915_private *dev_priv) { + i915_perf_fini(dev_priv); + i915_audio_component_cleanup(dev_priv); intel_gpu_ips_teardown(); @@ -2578,6 +2583,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = { DRM_IOCTL_DEF_DRV(I915_GEM_USERPTR, i915_gem_userptr_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM, i915_gem_context_getparam_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_SETPARAM, i915_gem_context_setparam_ioctl, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(I915_PERF_OPEN, i915_perf_open_ioctl, DRM_RENDER_ALLOW), }; static struct drm_driver driver = { diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 65ada5d..7b801d9 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1724,6 +1724,85 @@ struct intel_wm_config { bool sprites_scaled; }; +struct i915_perf_read_state { + int count; + ssize_t read; + char __user *buf; +}; + +struct i915_perf_stream; + +struct i915_perf_stream_ops { + /* Enables the collection of HW samples, either in response to +* I915_PERF_IOCTL_ENABLE or implicitly called when stream is +* opened without I915_PERF_FLAG_DISABLED. +*/ + void (*enable)(struct i915_perf_stream *stream); + + /* Disables the collection of HW samples, either in response to +* I915_PERF_IOCTL_DISABLE or implicitly called before +* destroying the stream. +*/ + void (*disable)(struct i915_perf_stream *stream); + + /* Return: true if any i915 perf records are ready to read() +* for this stream. +*/ + bool (*can_read)(struct i915_perf_stream *stream); + + /* Call poll_wait, passing a wait queue that will be woken +* once there is something ready to read() for the stream +*/ + void (*poll_wait)(struct i915_perf_stream *stream, + struct file *file, +
[Intel-gfx] [PATCH v3 10/11] drm/i915: Add more Haswell OA metric sets
This adds 'compute', 'compute extended', 'memory reads', 'memory writes' and 'sampler balance' metric sets for Haswell. The code is auto generated from an XML description of metric sets, currently maintained in gputop, ref: https://github.com/rib/gputop > gputop-data/oa-*.xml > scripts/i915-perf-kernelgen.py $ make -C gputop-data -f Makefile.xml Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_oa_hsw.c | 484 - 1 file changed, 483 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c b/drivers/gpu/drm/i915/i915_oa_hsw.c index c32b5f8..81e5628 100644 --- a/drivers/gpu/drm/i915/i915_oa_hsw.c +++ b/drivers/gpu/drm/i915/i915_oa_hsw.c @@ -30,9 +30,14 @@ enum metric_set_id { METRIC_SET_ID_RENDER_BASIC = 1, + METRIC_SET_ID_COMPUTE_BASIC, + METRIC_SET_ID_COMPUTE_EXTENDED, + METRIC_SET_ID_MEMORY_READS, + METRIC_SET_ID_MEMORY_WRITES, + METRIC_SET_ID_SAMPLER_BALANCE, }; -int i915_oa_n_builtin_metric_sets_hsw = 1; +int i915_oa_n_builtin_metric_sets_hsw = 6; static const struct i915_oa_reg b_counter_config_render_basic[] = { { _MMIO(0x2724), 0x0080 }, @@ -118,6 +123,333 @@ static int select_render_basic_config(struct drm_i915_private *dev_priv) return 0; } +static const struct i915_oa_reg b_counter_config_compute_basic[] = { + { _MMIO(0x2710), 0x }, + { _MMIO(0x2714), 0x0080 }, + { _MMIO(0x2718), 0x }, + { _MMIO(0x271c), 0x }, + { _MMIO(0x2720), 0x }, + { _MMIO(0x2724), 0x0080 }, + { _MMIO(0x2728), 0x }, + { _MMIO(0x272c), 0x }, + { _MMIO(0x2740), 0x }, + { _MMIO(0x2744), 0x }, + { _MMIO(0x2748), 0x }, + { _MMIO(0x274c), 0x }, + { _MMIO(0x2750), 0x }, + { _MMIO(0x2754), 0x }, + { _MMIO(0x2758), 0x }, + { _MMIO(0x275c), 0x }, + { _MMIO(0x236c), 0x }, +}; + +static const struct i915_oa_reg mux_config_compute_basic[] = { + { _MMIO(0x253a4), 0x }, + { _MMIO(0x2681c), 0x01f00800 }, + { _MMIO(0x26820), 0x1000 }, + { _MMIO(0x2781c), 0x01f00800 }, + { _MMIO(0x26520), 0x0007 }, + { _MMIO(0x265a0), 0x0007 }, + { _MMIO(0x25380), 0x0010 }, + { _MMIO(0x2538c), 0x0030 }, + { _MMIO(0x25384), 0xaa8a }, + { _MMIO(0x25404), 0x }, + { _MMIO(0x26800), 0x4202 }, + { _MMIO(0x26808), 0x00605817 }, + { _MMIO(0x2680c), 0x10001005 }, + { _MMIO(0x26804), 0x }, + { _MMIO(0x27800), 0x0102 }, + { _MMIO(0x27808), 0x0c0701e0 }, + { _MMIO(0x2780c), 0x000200a0 }, + { _MMIO(0x27804), 0x }, + { _MMIO(0x26484), 0x4400 }, + { _MMIO(0x26704), 0x4400 }, + { _MMIO(0x26500), 0x0006 }, + { _MMIO(0x26510), 0x0001 }, + { _MMIO(0x26504), 0x8800 }, + { _MMIO(0x26580), 0x0006 }, + { _MMIO(0x26590), 0x0020 }, + { _MMIO(0x26584), 0x }, + { _MMIO(0x26104), 0x5582 }, + { _MMIO(0x26184), 0xaa86 }, + { _MMIO(0x25420), 0x08320c83 }, + { _MMIO(0x25424), 0x06820c83 }, + { _MMIO(0x2541c), 0x }, + { _MMIO(0x25428), 0x0c03 }, +}; + +static int select_compute_basic_config(struct drm_i915_private *dev_priv) +{ + dev_priv->perf.oa.mux_regs = + mux_config_compute_basic; + dev_priv->perf.oa.mux_regs_len = + ARRAY_SIZE(mux_config_compute_basic); + + dev_priv->perf.oa.b_counter_regs = + b_counter_config_compute_basic; + dev_priv->perf.oa.b_counter_regs_len = + ARRAY_SIZE(b_counter_config_compute_basic); + + return 0; +} + +static const struct i915_oa_reg b_counter_config_compute_extended[] = { + { _MMIO(0x2724), 0xf080 }, + { _MMIO(0x2720), 0x }, + { _MMIO(0x2714), 0xf080 }, + { _MMIO(0x2710), 0x }, + { _MMIO(0x2770), 0x0007fe2a }, + { _MMIO(0x2774), 0xff00 }, + { _MMIO(0x2778), 0x0007fe6a }, + { _MMIO(0x277c), 0xff00 }, + { _MMIO(0x2780), 0x0007fe92 }, + { _MMIO(0x2784), 0xff00 }, + { _MMIO(0x2788), 0x0007fea2 }, + { _MMIO(0x278c), 0xff00 }, + { _MMIO(0x2790), 0x0007fe32 }, + { _MMIO(0x2794), 0xff00 }, + { _MMIO(0x2798), 0x0007fe9a }, + { _MMIO(0x279c), 0xff00 }, + { _MMIO(0x27a0), 0x0007ff23 }, + { _MMIO(0x27a4), 0xff00 }, + { _MMIO(0x27a8), 0x0007fff3 }, + { _MMIO(0x27ac), 0xfffe }, +}; + +static const struct i915_oa_reg mux_config_compute_extended[] = { + { _MMIO(0x2681c), 0x3eb00800 }, + { _MMIO(0x26820), 0x0090 }, + { _MMIO(0x25384), 0x02a
[Intel-gfx] [PATCH v3 04/11] drm/i915: don't whitelist oacontrol in cmd parser
Being able to program OACONTROL from a non-privileged batch buffer is not sufficient to be able to configure the OA unit. This was originally allowed to help enable Mesa to expose OA counters via the INTEL_performance_query extension, but the current implementation based on programming OACONTROL via a batch buffer isn't able to report useable data without a more complete OA unit configuration. Mesa handles the possibility that writes to OACONTROL may not be allowed and so only advertises the extension after explicitly testing that a write to OACONTROL succeeds. Based on this; removing OACONTROL from the whitelist should be ok for userspace. Removing this simplifies adding a new kernel api for configuring the OA unit without needing to consider the possibility that userspace might trample on OACONTROL state which we'd like to start managing within the kernel instead. In particular running any Mesa based GL application currently results in clearing OACONTROL when initializing which would disable the capturing of metrics. Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_cmd_parser.c | 38 ++ 1 file changed, 2 insertions(+), 36 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index 71e778b..ac03c71 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -446,7 +446,6 @@ static const struct drm_i915_reg_descriptor gen7_render_regs[] = { REG64(PS_INVOCATION_COUNT), REG64(PS_DEPTH_COUNT), REG64_IDX(RING_TIMESTAMP, RENDER_RING_BASE), - REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below. */ REG64(MI_PREDICATE_SRC0), REG64(MI_PREDICATE_SRC1), REG32(GEN7_3DPRIM_END_OFFSET), @@ -1049,8 +1048,7 @@ bool intel_engine_needs_cmd_parser(struct intel_engine_cs *engine) static bool check_cmd(const struct intel_engine_cs *engine, const struct drm_i915_cmd_descriptor *desc, const u32 *cmd, u32 length, - const bool is_master, - bool *oacontrol_set) + const bool is_master) { if (desc->flags & CMD_DESC_REJECT) { DRM_DEBUG_DRIVER("CMD: Rejected command: 0x%08X\n", *cmd); @@ -1088,31 +1086,6 @@ static bool check_cmd(const struct intel_engine_cs *engine, } /* -* OACONTROL requires some special handling for -* writes. We want to make sure that any batch which -* enables OA also disables it before the end of the -* batch. The goal is to prevent one process from -* snooping on the perf data from another process. To do -* that, we need to check the value that will be written -* to the register. Hence, limit OACONTROL writes to -* only MI_LOAD_REGISTER_IMM commands. -*/ - if (reg_addr == i915_mmio_reg_offset(GEN7_OACONTROL)) { - if (desc->cmd.value == MI_LOAD_REGISTER_MEM) { - DRM_DEBUG_DRIVER("CMD: Rejected LRM to OACONTROL\n"); - return false; - } - - if (desc->cmd.value == MI_LOAD_REGISTER_REG) { - DRM_DEBUG_DRIVER("CMD: Rejected LRR to OACONTROL\n"); - return false; - } - - if (desc->cmd.value == MI_LOAD_REGISTER_IMM(1)) - *oacontrol_set = (cmd[offset + 1] != 0); - } - - /* * Check the value written to the register against the * allowed mask/value pair given in the whitelist entry. */ @@ -1202,7 +1175,6 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, { u32 *cmd, *batch_base, *batch_end; struct drm_i915_cmd_descriptor default_desc = { 0 }; - bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */ int ret = 0; batch_base = copy_batch(shadow_batch_obj, batch_obj, @@ -1259,8 +1231,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, break; } - if (!check_cmd(engine, desc, cmd, length, is_master, - &oacontrol_set)) { + if (!check_cmd(engine, desc, cmd, length, is_master)) { ret = -EACCES; break; } @@ -1268,11 +1239,6 @@ int intel_engine_cmd_parser(st
[Intel-gfx] [PATCH v3 11/11] drm/i915: Add a kerneldoc summary for i915_perf.c
In particular this tries to capture for posterity some of the early challenges we had with using the core perf infrastructure in case we ever want to revisit adapting perf for device metrics. Cc: Chris Wilson Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_perf.c | 163 +++ 1 file changed, 163 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 18cb651..54ddf74 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -24,6 +24,169 @@ * Robert Bragg */ + +/** + * DOC: i915 Perf, streaming API for GPU metrics + * + * Gen graphics supports a large number of performance counters that can help + * driver and application developers understand and optimize their use of the + * GPU. + * + * This i915 perf interface enables userspace to configure and open a file + * descriptor representing a stream of GPU metrics which can then be read() as + * a stream of sample records. + * + * The interface is particularly suited to exposing buffered metrics that are + * captured by DMA from the GPU, unsynchronized with and unrelated to the CPU. + * + * Streams representing a single context are accessible to applications with a + * corresponding drm file descriptor, such that OpenGL can use the interface + * without special privileges. Access to system-wide metrics requires root + * privileges by default, unless changed via the dev.i915.perf_event_paranoid + * sysctl option. + * + * + * The interface was initially inspired by the core Perf infrastructure but + * some notable differences are: + * + * i915 perf file descriptors represent a "stream" instead of an "event"; where + * a perf event primarily corresponds to a single 64bit value, while a stream + * might sample sets of tightly-coupled counters, depending on the + * configuration. For example the Gen OA unit isn't designed to support + * orthogonal configurations of individual counters; it's configured for a set + * of related counters. Samples for an i915 perf stream capturing OA metrics + * will include a set of counter values packed in a compact HW specific format. + * The OA unit supports a number of different packing formats which can be + * selected by the user opening the stream. Perf has support for grouping + * events, but each event in the group is configured, validated and + * authenticated individually with separate system calls. + * + * i915 perf stream configurations are provided as an array of u64 (key,value) + * pairs, instead of a fixed struct with multiple miscellaneous config members, + * interleaved with event-type specific members. + * + * i915 perf doesn't support exposing metrics via an mmap'd circular buffer. + * The supported metrics are being written to memory by the GPU unsynchronized + * with the CPU, using HW specific packing formats for counter sets. Sometimes + * the constraints on HW configuration require reports to be filtered before it + * would be acceptable to expose them to unprivileged applications - to hide + * the metrics of other processes/contexts. For these use cases a read() based + * interface is a good fit, and provides an opportunity to filter data as it + * gets copied from the GPU mapped buffers to userspace buffers. + * + * + * Some notes regarding Linux Perf: + * + * + * The first prototype of this driver was based on the core perf + * infrastructure, and while we did make that mostly work, with some changes to + * perf, we found we were breaking or working around too many assumptions baked + * into perf's currently cpu centric design. + * + * In the end we didn't see a clear benefit to making perf's implementation and + * interface more complex by changing design assumptions while we knew we still + * wouldn't be able to use any existing perf based userspace tools. + * + * Also considering the Gen specific nature of the Observability hardware and + * how userspace will sometimes need to combine i915 perf OA metrics with + * side-band OA data captured via MI_REPORT_PERF_COUNT commands; we're + * expecting the interface to be used by a platform specific userspace such as + * OpenGL or tools. This is to say; we aren't inherently missing out on having + * a standard vendor/architecture agnostic interface by not using perf. + * + * + * For posterity, in case we might re-visit trying to adapt core perf to be + * better suited to exposing i915 metrics these were the main pain points we + * hit: + * + * - The perf based OA PMU driver broke some significant design assumptions: + * + * Existing perf pmus are used for profiling work on a cpu and we were + * introducing the idea of _IS_DEVICE pmus with different security + * implications, the need to fake cpu-related data (such as user/kernel + * registers) to fit with perf's current design, and adding _DEVICE records + * as a
[Intel-gfx] [PATCH v3 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit
Gen graphics hardware can be set up to periodically write snapshots of performance counters into a circular buffer via its Observation Architecture and this patch exposes that capability to userspace via the i915 perf interface. Cc: Chris Wilson Signed-off-by: Robert Bragg Signed-off-by: Zhenyu Wang --- drivers/gpu/drm/i915/i915_drv.h | 68 ++- drivers/gpu/drm/i915/i915_gem_context.c | 22 +- drivers/gpu/drm/i915/i915_perf.c| 977 +++- drivers/gpu/drm/i915/i915_reg.h | 338 +++ drivers/gpu/drm/i915/intel_ringbuffer.c | 7 +- include/uapi/drm/i915_drm.h | 70 ++- 6 files changed, 1449 insertions(+), 33 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 95222f0..48595c9 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1724,6 +1724,11 @@ struct intel_wm_config { bool sprites_scaled; }; +struct i915_oa_format { + u32 format; + int size; +}; + struct i915_oa_reg { i915_reg_t addr; u32 value; @@ -1750,11 +1755,6 @@ struct i915_perf_stream_ops { */ void (*disable)(struct i915_perf_stream *stream); - /* Return: true if any i915 perf records are ready to read() -* for this stream. -*/ - bool (*can_read)(struct i915_perf_stream *stream); - /* Call poll_wait, passing a wait queue that will be woken * once there is something ready to read() for the stream */ @@ -1764,9 +1764,7 @@ struct i915_perf_stream_ops { /* For handling a blocking read, wait until there is something * to ready to read() for the stream. E.g. wait on the same -* wait queue that would be passed to poll_wait() until -* ->can_read() returns true (if its safe to call ->can_read() -* without the i915 perf lock held). +* wait queue that would be passed to poll_wait(). */ int (*wait_unlocked)(struct i915_perf_stream *stream); @@ -1801,11 +1799,26 @@ struct i915_perf_stream { struct list_head link; u32 sample_flags; + int sample_size; struct i915_gem_context *ctx; bool enabled; - struct i915_perf_stream_ops *ops; + const struct i915_perf_stream_ops *ops; +}; + +struct i915_oa_ops { + void (*init_oa_buffer)(struct drm_i915_private *dev_priv); + int (*enable_metric_set)(struct drm_i915_private *dev_priv); + void (*disable_metric_set)(struct drm_i915_private *dev_priv); + void (*oa_enable)(struct drm_i915_private *dev_priv); + void (*oa_disable)(struct drm_i915_private *dev_priv); + void (*update_oacontrol)(struct drm_i915_private *dev_priv); + void (*update_hw_ctx_id_locked)(struct drm_i915_private *dev_priv, + u32 ctx_id); + int (*read)(struct i915_perf_stream *stream, + struct i915_perf_read_state *read_state); + bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv); }; struct drm_i915_private { @@ -2100,16 +2113,47 @@ struct drm_i915_private { struct { bool initialized; + struct mutex lock; struct list_head streams; + spinlock_t hook_lock; + struct { - u32 metrics_set; + struct i915_perf_stream *exclusive_stream; + + u32 specific_ctx_id; + + struct hrtimer poll_check_timer; + wait_queue_head_t poll_wq; + atomic_t pollin; + + bool periodic; + int period_exponent; + int timestamp_frequency; + + int tail_margin; + + int metrics_set; const struct i915_oa_reg *mux_regs; int mux_regs_len; const struct i915_oa_reg *b_counter_regs; int b_counter_regs_len; + + struct { + struct drm_i915_gem_object *obj; + u32 gtt_offset; + u8 *addr; + int format; + int format_size; + } oa_buffer; + + u32 gen7_latched_oastatus1; + + struct i915_oa_ops ops; + const struct i915_oa_format *oa_formats; + int n_builtin_sets; } oa; } perf; @@ -3476,6 +3520,8 @@ struct drm_i915_gem_object * i915_gem_alloc_context_obj(struct drm_device *dev, size_t size); struct i915_gem_context * i915_gem_context_create_gvt(struct drm_device *dev); +int i915_gem_context_pin_legacy_rcs_state(struct drm_i915_private *de
[Intel-gfx] [PATCH v3 08/11] drm/i915: Add dev.i915.perf_event_paranoid sysctl option
Consistent with the kernel.perf_event_paranoid sysctl option that can allow non-root users to access system wide cpu metrics, this can optionally allow non-root users to access system wide OA counter metrics from Gen graphics hardware. Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_perf.c | 46 +++- 2 files changed, 46 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index d5c7b70..a49801f 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2115,6 +2115,7 @@ struct drm_i915_private { bool initialized; struct kobject *metrics_kobj; + struct ctl_table_header *sysctl_header; struct mutex lock; struct list_head streams; diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index a943664..c9e7104 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -62,6 +62,8 @@ #define POLL_FREQUENCY 200 #define POLL_PERIOD (NSEC_PER_SEC / POLL_FREQUENCY) +static u32 i915_perf_stream_paranoid = true; + /* The maximum exponent the hardware accepts is 63 (essentially it selects one * of the 64bit timestamp bits to trigger reports from) but there's currently * no known use case for sampling as infrequently as once per 47 thousand years. @@ -1121,7 +1123,13 @@ int i915_perf_open_ioctl_locked(struct drm_device *dev, } } - if (!specific_ctx && !capable(CAP_SYS_ADMIN)) { + /* Similar to perf's kernel.perf_paranoid_cpu sysctl option +* we check a dev.i915.perf_stream_paranoid sysctl option +* to determine if it's ok to access system wide OA counters +* without CAP_SYS_ADMIN privileges. +*/ + if (!specific_ctx && + i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) { DRM_ERROR("Insufficient privileges to open system-wide i915 perf stream\n"); ret = -EACCES; goto err_ctx; @@ -1325,6 +1333,38 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data, return ret; } + +static struct ctl_table oa_table[] = { + { +.procname = "perf_stream_paranoid", +.data = &i915_perf_stream_paranoid, +.maxlen = sizeof(i915_perf_stream_paranoid), +.mode = 0644, +.proc_handler = proc_dointvec, +}, + {} +}; + +static struct ctl_table i915_root[] = { + { +.procname = "i915", +.maxlen = 0, +.mode = 0555, +.child = oa_table, +}, + {} +}; + +static struct ctl_table dev_root[] = { + { +.procname = "dev", +.maxlen = 0, +.mode = 0555, +.child = i915_root, +}, + {} +}; + void i915_perf_init(struct drm_i915_private *dev_priv) { if (!IS_HASWELL(dev_priv)) @@ -1369,6 +1409,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv) return; } + dev_priv->perf.sysctl_header = register_sysctl_table(dev_root); + dev_priv->perf.initialized = true; return; @@ -1379,6 +1421,8 @@ void i915_perf_fini(struct drm_i915_private *dev_priv) if (!dev_priv->perf.initialized) return; + unregister_sysctl_table(dev_priv->perf.sysctl_header); + i915_perf_deinit_sysfs_hsw(dev_priv); kobject_put(dev_priv->perf.metrics_kobj); -- 2.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v3 05/11] drm/i915: Add 'render basic' Haswell OA unit config
Adds a static OA unit, MUX + B Counter configuration for basic render metrics on Haswell. This is auto generated from an XML description of metric sets, currently maintained in gputop, ref: https://github.com/rib/gputop > gputop-data/oa-*.xml > scripts/i915-perf-kernelgen.py $ make -C gputop-data -f Makefile.xml SYSFS=0 WHITELIST=RenderBasic Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/Makefile | 3 +- drivers/gpu/drm/i915/i915_drv.h| 14 drivers/gpu/drm/i915/i915_oa_hsw.c | 132 + drivers/gpu/drm/i915/i915_oa_hsw.h | 34 ++ 4 files changed, 182 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.c create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 9a2f1f9..0eb4c83 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -107,7 +107,8 @@ i915-y += dvo_ch7017.o \ i915-y += i915_vgpu.o # perf code -i915-y += i915_perf.o +i915-y += i915_perf.o \ + i915_oa_hsw.o ifeq ($(CONFIG_DRM_I915_GVT),y) i915-y += intel_gvt.o diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 7b801d9..95222f0 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1724,6 +1724,11 @@ struct intel_wm_config { bool sprites_scaled; }; +struct i915_oa_reg { + i915_reg_t addr; + u32 value; +}; + struct i915_perf_read_state { int count; ssize_t read; @@ -2097,6 +2102,15 @@ struct drm_i915_private { bool initialized; struct mutex lock; struct list_head streams; + + struct { + u32 metrics_set; + + const struct i915_oa_reg *mux_regs; + int mux_regs_len; + const struct i915_oa_reg *b_counter_regs; + int b_counter_regs_len; + } oa; } perf; /* Abstract the submission mechanism (legacy ringbuffer or execlists) away */ diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c b/drivers/gpu/drm/i915/i915_oa_hsw.c new file mode 100644 index 000..3e6006ec --- /dev/null +++ b/drivers/gpu/drm/i915/i915_oa_hsw.c @@ -0,0 +1,132 @@ +/* + * Autogenerated file, DO NOT EDIT manually! + * + * Copyright (c) 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + */ + +#include "i915_drv.h" + +enum metric_set_id { + METRIC_SET_ID_RENDER_BASIC = 1, +}; + +int i915_oa_n_builtin_metric_sets_hsw = 1; + +static const struct i915_oa_reg b_counter_config_render_basic[] = { + { _MMIO(0x2724), 0x0080 }, + { _MMIO(0x2720), 0x }, + { _MMIO(0x2714), 0x0080 }, + { _MMIO(0x2710), 0x }, +}; + +static const struct i915_oa_reg mux_config_render_basic[] = { + { _MMIO(0x253a4), 0x0160 }, + { _MMIO(0x25440), 0x0010 }, + { _MMIO(0x25128), 0x }, + { _MMIO(0x2691c), 0x0800 }, + { _MMIO(0x26aa0), 0x0150 }, + { _MMIO(0x26b9c), 0x6000 }, + { _MMIO(0x2791c), 0x0800 }, + { _MMIO(0x27aa0), 0x0150 }, + { _MMIO(0x27b9c), 0x6000 }, + { _MMIO(0x2641c), 0x0400 }, + { _MMIO(0x25380), 0x0010 }, + { _MMIO(0x2538c), 0x }, + { _MMIO(0x25384), 0x0800 }, + { _MMIO(0x25400), 0x0004 }, + { _MMIO(0x2540c), 0x06029000 }, + { _MMIO(0x25410), 0x0002 }, + { _MMIO(0x25404), 0x5c30 }, + { _MMIO(0x25100), 0x0016 }, + { _MMIO(0x25110), 0x0400 }, + { _MMIO(0x25104), 0x }, + { _MMIO(0x26804), 0x1211 }, + { _MMIO(0x26884), 0x0100 }, + { _MMIO(0x26900), 0x0002 }, + { _MMIO(0x2
Re: [Intel-gfx] [PATCH v3 01/11] drm/i915: Add i915 perf infrastructure
On Mon, Aug 15, 2016 at 3:57 PM, Chris Wilson wrote: > On Mon, Aug 15, 2016 at 03:41:18PM +0100, Robert Bragg wrote: > > Adds base i915 perf infrastructure for Gen performance metrics. > > > > This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64 > > properties to configure a stream of metrics and returns a new fd usable > > with standard VFS system calls including read() to read typed and sized > > records; ioctl() to enable or disable capture and poll() to wait for > > data. > > > > A stream is opened something like: > > > > uint64_t properties[] = { > > /* Single context sampling */ > > DRM_I915_PERF_PROP_CTX_HANDLE,ctx_handle, > > > > /* Include OA reports in samples */ > > DRM_I915_PERF_PROP_SAMPLE_OA, true, > > > > /* OA unit configuration */ > > DRM_I915_PERF_PROP_OA_METRICS_SET,metrics_set_id, > > DRM_I915_PERF_PROP_OA_FORMAT, report_format, > > DRM_I915_PERF_PROP_OA_EXPONENT, period_exponent, > >}; > >struct drm_i915_perf_open_param parm = { > > .flags = I915_PERF_FLAG_FD_CLOEXEC | > >I915_PERF_FLAG_FD_NONBLOCK | > >I915_PERF_FLAG_DISABLED, > > .properties_ptr = (uint64_t)properties, > > .num_properties = sizeof(properties) / 16, > >}; > >int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, ¶m); > > > > Records read all start with a common { type, size } header with > > DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records > > contain an extensible number of fields and it's the > > DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that > > determine what's included in every sample. > > > > No specific streams are supported yet so any attempt to open a stream > > will return an error. > > > > Signed-off-by: Robert Bragg > > --- > > drivers/gpu/drm/i915/Makefile| 3 + > > drivers/gpu/drm/i915/i915_drv.c | 6 + > > drivers/gpu/drm/i915/i915_drv.h | 92 + > > drivers/gpu/drm/i915/i915_perf.c | 430 ++ > + > > include/uapi/drm/i915_drm.h | 67 ++ > > 5 files changed, 598 insertions(+) > > create mode 100644 drivers/gpu/drm/i915/i915_perf.c > > > > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/ > Makefile > > index 6092f0e..9a2f1f9 100644 > > --- a/drivers/gpu/drm/i915/Makefile > > +++ b/drivers/gpu/drm/i915/Makefile > > @@ -106,6 +106,9 @@ i915-y += dvo_ch7017.o \ > > # virtual gpu code > > i915-y += i915_vgpu.o > > > > +# perf code > > +i915-y += i915_perf.o > > + > > ifeq ($(CONFIG_DRM_I915_GVT),y) > > i915-y += intel_gvt.o > > include $(src)/gvt/Makefile > > diff --git a/drivers/gpu/drm/i915/i915_drv.c > b/drivers/gpu/drm/i915/i915_drv.c > > index 83afdd0..f5209ff 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.c > > +++ b/drivers/gpu/drm/i915/i915_drv.c > > @@ -1169,6 +1169,9 @@ static void i915_driver_register(struct > drm_i915_private *dev_priv) > >* cannot run before the connectors are registered. > >*/ > > intel_fbdev_initial_config_async(dev); > > + > > + /* Depends on sysfs having been initialized */ > > + i915_perf_init(dev_priv); > > Then please call it i915_perf_register() and i915_perf_unregister() > respectively. > > > + struct { > > + bool initialized; > > This is bogus. We simply cannot allow userspace to access internals > before we set them up. As it stands you have no serialisation here, so > the protect is moot. > Ah, previously I was initializing in i915_driver_load() and I think it should have been synchronized by requiring a drm fd to interact with the driver. I'd need to dig in to understand why, but it was previously ok to init before i915_setup_sysfs(), so this looks like I messed up the rebase, not properly appreciating I was now initializing after being visible to userspace. > i915_perf_init() needs to be split up into the early initialisation > function to setup internals, and the i915_perf_register() function to > enable userspace (with however many steps you need in between). > I think it's probably just the sysfs bits that may need to be deferred until after i915_sysfs_init(), after drm_dev_register() where we're visible to userspace. > > > + struct list_head streams; > > + } perf; > > + > > /* Abstract the submission mechanism (legacy ringbuffer or > exe
Re: [Intel-gfx] [PATCH v3 03/11] drm/i915: return EACCES for check_cmd() failures
On Mon, Aug 15, 2016 at 4:04 PM, Chris Wilson wrote: > On Mon, Aug 15, 2016 at 03:41:20PM +0100, Robert Bragg wrote: > > check_cmd() is checking whether a command adheres to certain > > restrictions that ensure it's safe to execute within a privileged batch > > buffer. Returning false implies a privilege problem, not that the > > command is invalid. > > > > The distinction makes the difference between allowing the buffer to be > > executed as an unprivileged batch buffer or returning an EINVAL error to > > userspace without executing anything. > > Ah, but you choose to actually execute it instead. We can't allow that > either. > Okey, I've got myself a bit confused over this, going in circles a few times now... Initially I took this to imply the cmd parser is not only there to enable more than the HW policy allows, and there must be some stuff we're blocking that the HW policy otherwise allows (and therefore it's bad to return -EACCES here and fallback to the HW policy). One of the things that's confusing me is that looking at the command parser code, it looks like it's possible to bail early with an MI_BATCH_BUFFER_START command, returning -EACCES and falling back to the non-privileged HW policy. If the HW policy isn't strict enough in some cases then presumably we wouldn't ever allow an early fallback to the HW policy? oacontrol does look to be an example where it seems a little dubious that the HW considers it ok to write from a non-privileged buffer for gen8+, but for gen7 all LRIs are considered privileged afik and the -EACCES fallback should result in an oacontrol LRI becoming a NOOP. The change appears to be having the desired effect with respect to mesa's check for oacontrol writes failing gracefully with AMD_performance_monitor not being advertised. The fallback via -EACCES to the non-privileged HW policy seems to be NOOPing the LRIs based on running gputop to capture system-wide metrics (oacontrol enabled via i915 perf interface) and then running Mesa/GL applications that that would otherwise clobber oacontrol with test LRIs when deciding to advertise AMD_performance_monitor. This would interfere with gputop by disabling the OA unit if the LRIs were successful. Unless I've got the wrong end of the stick, I think this is ok for Haswell. - Robert > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre > ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v4 05/11] drm/i915: Add 'render basic' Haswell OA unit config
Adds a static OA unit, MUX + B Counter configuration for basic render metrics on Haswell. This is auto generated from an XML description of metric sets, currently maintained in gputop, ref: https://github.com/rib/gputop > gputop-data/oa-*.xml > scripts/i915-perf-kernelgen.py $ make -C gputop-data -f Makefile.xml SYSFS=0 WHITELIST=RenderBasic Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/Makefile | 3 +- drivers/gpu/drm/i915/i915_drv.h| 14 drivers/gpu/drm/i915/i915_oa_hsw.c | 132 + drivers/gpu/drm/i915/i915_oa_hsw.h | 34 ++ 4 files changed, 182 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.c create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 9a2f1f9..0eb4c83 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -107,7 +107,8 @@ i915-y += dvo_ch7017.o \ i915-y += i915_vgpu.o # perf code -i915-y += i915_perf.o +i915-y += i915_perf.o \ + i915_oa_hsw.o ifeq ($(CONFIG_DRM_I915_GVT),y) i915-y += intel_gvt.o diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 9f27a74..9070794 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1724,6 +1724,11 @@ struct intel_wm_config { bool sprites_scaled; }; +struct i915_oa_reg { + i915_reg_t addr; + u32 value; +}; + struct i915_perf_stream; struct i915_perf_stream_ops { @@ -2096,6 +2101,15 @@ struct drm_i915_private { bool initialized; struct mutex lock; struct list_head streams; + + struct { + u32 metrics_set; + + const struct i915_oa_reg *mux_regs; + int mux_regs_len; + const struct i915_oa_reg *b_counter_regs; + int b_counter_regs_len; + } oa; } perf; /* Abstract the submission mechanism (legacy ringbuffer or execlists) away */ diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c b/drivers/gpu/drm/i915/i915_oa_hsw.c new file mode 100644 index 000..3e6006ec --- /dev/null +++ b/drivers/gpu/drm/i915/i915_oa_hsw.c @@ -0,0 +1,132 @@ +/* + * Autogenerated file, DO NOT EDIT manually! + * + * Copyright (c) 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + */ + +#include "i915_drv.h" + +enum metric_set_id { + METRIC_SET_ID_RENDER_BASIC = 1, +}; + +int i915_oa_n_builtin_metric_sets_hsw = 1; + +static const struct i915_oa_reg b_counter_config_render_basic[] = { + { _MMIO(0x2724), 0x0080 }, + { _MMIO(0x2720), 0x }, + { _MMIO(0x2714), 0x0080 }, + { _MMIO(0x2710), 0x }, +}; + +static const struct i915_oa_reg mux_config_render_basic[] = { + { _MMIO(0x253a4), 0x0160 }, + { _MMIO(0x25440), 0x0010 }, + { _MMIO(0x25128), 0x }, + { _MMIO(0x2691c), 0x0800 }, + { _MMIO(0x26aa0), 0x0150 }, + { _MMIO(0x26b9c), 0x6000 }, + { _MMIO(0x2791c), 0x0800 }, + { _MMIO(0x27aa0), 0x0150 }, + { _MMIO(0x27b9c), 0x6000 }, + { _MMIO(0x2641c), 0x0400 }, + { _MMIO(0x25380), 0x0010 }, + { _MMIO(0x2538c), 0x }, + { _MMIO(0x25384), 0x0800 }, + { _MMIO(0x25400), 0x0004 }, + { _MMIO(0x2540c), 0x06029000 }, + { _MMIO(0x25410), 0x0002 }, + { _MMIO(0x25404), 0x5c30 }, + { _MMIO(0x25100), 0x0016 }, + { _MMIO(0x25110), 0x0400 }, + { _MMIO(0x25104), 0x }, + { _MMIO(0x26804), 0x1211 }, + { _MMIO(0x26884), 0x0100 }, + { _MMIO(0x26900), 0x0002 }, + { _MMIO(0x26908), 0
[Intel-gfx] [PATCH v4 04/11] drm/i915: don't whitelist oacontrol in cmd parser
Being able to program OACONTROL from a non-privileged batch buffer is not sufficient to be able to configure the OA unit. This was originally allowed to help enable Mesa to expose OA counters via the INTEL_performance_query extension, but the current implementation based on programming OACONTROL via a batch buffer isn't able to report useable data without a more complete OA unit configuration. Mesa handles the possibility that writes to OACONTROL may not be allowed and so only advertises the extension after explicitly testing that a write to OACONTROL succeeds. Based on this; removing OACONTROL from the whitelist should be ok for userspace. Removing this simplifies adding a new kernel api for configuring the OA unit without needing to consider the possibility that userspace might trample on OACONTROL state which we'd like to start managing within the kernel instead. In particular running any Mesa based GL application currently results in clearing OACONTROL when initializing which would disable the capturing of metrics. Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_cmd_parser.c | 38 ++ 1 file changed, 2 insertions(+), 36 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index 71e778b..ac03c71 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -446,7 +446,6 @@ static const struct drm_i915_reg_descriptor gen7_render_regs[] = { REG64(PS_INVOCATION_COUNT), REG64(PS_DEPTH_COUNT), REG64_IDX(RING_TIMESTAMP, RENDER_RING_BASE), - REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below. */ REG64(MI_PREDICATE_SRC0), REG64(MI_PREDICATE_SRC1), REG32(GEN7_3DPRIM_END_OFFSET), @@ -1049,8 +1048,7 @@ bool intel_engine_needs_cmd_parser(struct intel_engine_cs *engine) static bool check_cmd(const struct intel_engine_cs *engine, const struct drm_i915_cmd_descriptor *desc, const u32 *cmd, u32 length, - const bool is_master, - bool *oacontrol_set) + const bool is_master) { if (desc->flags & CMD_DESC_REJECT) { DRM_DEBUG_DRIVER("CMD: Rejected command: 0x%08X\n", *cmd); @@ -1088,31 +1086,6 @@ static bool check_cmd(const struct intel_engine_cs *engine, } /* -* OACONTROL requires some special handling for -* writes. We want to make sure that any batch which -* enables OA also disables it before the end of the -* batch. The goal is to prevent one process from -* snooping on the perf data from another process. To do -* that, we need to check the value that will be written -* to the register. Hence, limit OACONTROL writes to -* only MI_LOAD_REGISTER_IMM commands. -*/ - if (reg_addr == i915_mmio_reg_offset(GEN7_OACONTROL)) { - if (desc->cmd.value == MI_LOAD_REGISTER_MEM) { - DRM_DEBUG_DRIVER("CMD: Rejected LRM to OACONTROL\n"); - return false; - } - - if (desc->cmd.value == MI_LOAD_REGISTER_REG) { - DRM_DEBUG_DRIVER("CMD: Rejected LRR to OACONTROL\n"); - return false; - } - - if (desc->cmd.value == MI_LOAD_REGISTER_IMM(1)) - *oacontrol_set = (cmd[offset + 1] != 0); - } - - /* * Check the value written to the register against the * allowed mask/value pair given in the whitelist entry. */ @@ -1202,7 +1175,6 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, { u32 *cmd, *batch_base, *batch_end; struct drm_i915_cmd_descriptor default_desc = { 0 }; - bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */ int ret = 0; batch_base = copy_batch(shadow_batch_obj, batch_obj, @@ -1259,8 +1231,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, break; } - if (!check_cmd(engine, desc, cmd, length, is_master, - &oacontrol_set)) { + if (!check_cmd(engine, desc, cmd, length, is_master)) { ret = -EACCES; break; } @@ -1268,11 +1239,6 @@ int intel_engine_cmd_parser(st
[Intel-gfx] [PATCH v4 00/11] Enable i915 perf stream for Haswell OA unit
I've updated the stream->ops->read() interface to avoid the struct i915_perf_read_state so it's hopefully a bit clearer to see the state being passed around: int (*read)(struct i915_perf_stream *stream, char __user *buf, size_t count, size_t *offset); The inout offset is the buf offset (aka bytes read) which are still tracked separate from any error state until we get back to the common code that handles the final check for whether the error should be squashed if any data was successfully copied. This avoids some duplication and more fiddly error paths in the ->read implementations. The initialization code is now spit into an i915_perf_init() called in i915_driver_init_early() and an i915_perf_register() called in i915_driver_register() once we're visible to userspace, after sysfs has been initialized. - Robert Robert Bragg (11): drm/i915: Add i915 perf infrastructure drm/i915: rename OACONTROL GEN7_OACONTROL drm/i915: return EACCES for check_cmd() failures drm/i915: don't whitelist oacontrol in cmd parser drm/i915: Add 'render basic' Haswell OA unit config drm/i915: Enable i915 perf stream for Haswell OA unit drm/i915: advertise available metrics via sysfs drm/i915: Add dev.i915.perf_event_paranoid sysctl option drm/i915: add oa_event_min_timer_exponent sysctl drm/i915: Add more Haswell OA metric sets drm/i915: Add a kerneldoc summary for i915_perf.c drivers/gpu/drm/i915/Makefile |4 + drivers/gpu/drm/i915/i915_cmd_parser.c | 40 +- drivers/gpu/drm/i915/i915_drv.c |9 + drivers/gpu/drm/i915/i915_drv.h | 160 +++ drivers/gpu/drm/i915/i915_gem_context.c | 22 +- drivers/gpu/drm/i915/i915_oa_hsw.c | 659 drivers/gpu/drm/i915/i915_oa_hsw.h | 38 + drivers/gpu/drm/i915/i915_perf.c| 1668 +++ drivers/gpu/drm/i915/i915_reg.h | 340 ++- drivers/gpu/drm/i915/intel_ringbuffer.c |7 +- include/uapi/drm/i915_drm.h | 133 +++ 11 files changed, 3038 insertions(+), 42 deletions(-) create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.c create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.h create mode 100644 drivers/gpu/drm/i915/i915_perf.c -- 2.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v4 01/11] drm/i915: Add i915 perf infrastructure
Adds base i915 perf infrastructure for Gen performance metrics. This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64 properties to configure a stream of metrics and returns a new fd usable with standard VFS system calls including read() to read typed and sized records; ioctl() to enable or disable capture and poll() to wait for data. A stream is opened something like: uint64_t properties[] = { /* Single context sampling */ DRM_I915_PERF_PROP_CTX_HANDLE,ctx_handle, /* Include OA reports in samples */ DRM_I915_PERF_PROP_SAMPLE_OA, true, /* OA unit configuration */ DRM_I915_PERF_PROP_OA_METRICS_SET,metrics_set_id, DRM_I915_PERF_PROP_OA_FORMAT, report_format, DRM_I915_PERF_PROP_OA_EXPONENT, period_exponent, }; struct drm_i915_perf_open_param parm = { .flags = I915_PERF_FLAG_FD_CLOEXEC | I915_PERF_FLAG_FD_NONBLOCK | I915_PERF_FLAG_DISABLED, .properties_ptr = (uint64_t)properties, .num_properties = sizeof(properties) / 16, }; int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, ¶m); Records read all start with a common { type, size } header with DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records contain an extensible number of fields and it's the DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that determine what's included in every sample. No specific streams are supported yet so any attempt to open a stream will return an error. Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/Makefile| 3 + drivers/gpu/drm/i915/i915_drv.c | 4 + drivers/gpu/drm/i915/i915_drv.h | 91 drivers/gpu/drm/i915/i915_perf.c | 448 +++ include/uapi/drm/i915_drm.h | 67 ++ 5 files changed, 613 insertions(+) create mode 100644 drivers/gpu/drm/i915/i915_perf.c diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 6092f0e..9a2f1f9 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -106,6 +106,9 @@ i915-y += dvo_ch7017.o \ # virtual gpu code i915-y += i915_vgpu.o +# perf code +i915-y += i915_perf.o + ifeq ($(CONFIG_DRM_I915_GVT),y) i915-y += intel_gvt.o include $(src)/gvt/Makefile diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 83afdd0..92f668e 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -851,6 +851,8 @@ static int i915_driver_init_early(struct drm_i915_private *dev_priv, intel_device_info_dump(dev_priv); + i915_perf_init(dev_priv); + /* Not all pre-production machines fall into this category, only the * very first ones. Almost everything should work, except for maybe * suspend/resume. And we don't implement workarounds that affect only @@ -872,6 +874,7 @@ err_workqueues: */ static void i915_driver_cleanup_early(struct drm_i915_private *dev_priv) { + i915_perf_fini(dev_priv); i915_gem_load_cleanup(&dev_priv->drm); i915_workqueues_cleanup(dev_priv); } @@ -2578,6 +2581,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = { DRM_IOCTL_DEF_DRV(I915_GEM_USERPTR, i915_gem_userptr_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM, i915_gem_context_getparam_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_SETPARAM, i915_gem_context_setparam_ioctl, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(I915_PERF_OPEN, i915_perf_open_ioctl, DRM_RENDER_ALLOW), }; static struct drm_driver driver = { diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 65ada5d..9f27a74 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1724,6 +1724,84 @@ struct intel_wm_config { bool sprites_scaled; }; +struct i915_perf_stream; + +struct i915_perf_stream_ops { + /* Enables the collection of HW samples, either in response to +* I915_PERF_IOCTL_ENABLE or implicitly called when stream is +* opened without I915_PERF_FLAG_DISABLED. +*/ + void (*enable)(struct i915_perf_stream *stream); + + /* Disables the collection of HW samples, either in response to +* I915_PERF_IOCTL_DISABLE or implicitly called before +* destroying the stream. +*/ + void (*disable)(struct i915_perf_stream *stream); + + /* Return: true if any i915 perf records are ready to read() +* for this stream. +*/ + bool (*can_read)(struct i915_perf_stream *stream); + + /* Call poll_wait, passing a wait queue that will be woken +* once there is something ready to read() for the stream +*/ + void (*poll_wait)(struct i915_perf_stream *stream, + struct file *file, + poll_table *wait); + + /* For handling
[Intel-gfx] [PATCH v4 09/11] drm/i915: add oa_event_min_timer_exponent sysctl
The minimal sampling period is now configurable via a dev.i915.oa_min_timer_exponent sysctl parameter. Following the precedent set by perf, the default is the minimum that won't (on its own) exceed the default kernel.perf_event_max_sample_rate default of 10 samples/s. Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_perf.c | 42 1 file changed, 30 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index ac1f600..3d0ba09 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -74,6 +74,23 @@ static u32 i915_perf_stream_paranoid = true; */ #define OA_EXPONENT_MAX 31 +/* for sysctl proc_dointvec_minmax of i915_oa_min_timer_exponent */ +static int zero; +static int oa_exponent_max = OA_EXPONENT_MAX; + +/* Theoretically we can program the OA unit to sample every 160ns but don't + * allow that by default unless root... + * + * The period is derived from the exponent as: + * + * period = 80ns * 2^(exponent + 1) + * + * Referring to perf's kernel.perf_event_max_sample_rate for a precedent + * (10 by default); with an OA exponent of 6 we get a period of 10.240 + * microseconds - just under 10Hz + */ +static u32 i915_oa_min_timer_exponent = 6; + /* XXX: beware if future OA HW adds new report formats that the current * code assumes all reports have a power-of-two size and ~(size - 1) can * be used as a mask to align the OA tail pointer. @@ -1303,21 +1320,13 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, return -EINVAL; } - /* NB: The exponent represents a period as follows: -* -* 80ns * 2^(period_exponent + 1) -* -* Theoretically we can program the OA unit to sample + /* Theoretically we can program the OA unit to sample * every 160ns but don't allow that by default unless * root. -* -* Referring to perf's -* kernel.perf_event_max_sample_rate for a precedent -* (10 by default); with an OA exponent of 6 we get -* a period of 10.240 microseconds -just under 10Hz */ - if (value < 6 && !capable(CAP_SYS_ADMIN)) { - DRM_ERROR("Sampling period too high without root privileges\n"); + if (value < i915_oa_min_timer_exponent && + !capable(CAP_SYS_ADMIN)) { + DRM_ERROR("OA timer exponent too low without root privileges\n"); return -EACCES; } @@ -1415,6 +1424,15 @@ static struct ctl_table oa_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, + { +.procname = "oa_min_timer_exponent", +.data = &i915_oa_min_timer_exponent, +.maxlen = sizeof(i915_oa_min_timer_exponent), +.mode = 0644, +.proc_handler = proc_dointvec_minmax, +.extra1 = &zero, +.extra2 = &oa_exponent_max, +}, {} }; -- 2.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v4 02/11] drm/i915: rename OACONTROL GEN7_OACONTROL
OACONTROL changes quite a bit for gen8, with some bits split out into a per-context OACTXCONTROL register. Rename now before adding more gen7 OA registers Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_cmd_parser.c | 4 ++-- drivers/gpu/drm/i915/i915_reg.h| 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index 1db829c..cfe3e7a 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -446,7 +446,7 @@ static const struct drm_i915_reg_descriptor gen7_render_regs[] = { REG64(PS_INVOCATION_COUNT), REG64(PS_DEPTH_COUNT), REG64_IDX(RING_TIMESTAMP, RENDER_RING_BASE), - REG32(OACONTROL), /* Only allowed for LRI and SRM. See below. */ + REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below. */ REG64(MI_PREDICATE_SRC0), REG64(MI_PREDICATE_SRC1), REG32(GEN7_3DPRIM_END_OFFSET), @@ -1097,7 +1097,7 @@ static bool check_cmd(const struct intel_engine_cs *engine, * to the register. Hence, limit OACONTROL writes to * only MI_LOAD_REGISTER_IMM commands. */ - if (reg_addr == i915_mmio_reg_offset(OACONTROL)) { + if (reg_addr == i915_mmio_reg_offset(GEN7_OACONTROL)) { if (desc->cmd.value == MI_LOAD_REGISTER_MEM) { DRM_DEBUG_DRIVER("CMD: Rejected LRM to OACONTROL\n"); return false; diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 9397dde..ba91eff 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -616,7 +616,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define HSW_CS_GPR(n) _MMIO(0x2600 + (n) * 8) #define HSW_CS_GPR_UDW(n) _MMIO(0x2600 + (n) * 8 + 4) -#define OACONTROL _MMIO(0x2360) +#define GEN7_OACONTROL _MMIO(0x2360) #define _GEN7_PIPEA_DE_LOAD_SL 0x70068 #define _GEN7_PIPEB_DE_LOAD_SL 0x71068 -- 2.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v4 10/11] drm/i915: Add more Haswell OA metric sets
This adds 'compute', 'compute extended', 'memory reads', 'memory writes' and 'sampler balance' metric sets for Haswell. The code is auto generated from an XML description of metric sets, currently maintained in gputop, ref: https://github.com/rib/gputop > gputop-data/oa-*.xml > scripts/i915-perf-kernelgen.py $ make -C gputop-data -f Makefile.xml Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_oa_hsw.c | 484 - 1 file changed, 483 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c b/drivers/gpu/drm/i915/i915_oa_hsw.c index c32b5f8..81e5628 100644 --- a/drivers/gpu/drm/i915/i915_oa_hsw.c +++ b/drivers/gpu/drm/i915/i915_oa_hsw.c @@ -30,9 +30,14 @@ enum metric_set_id { METRIC_SET_ID_RENDER_BASIC = 1, + METRIC_SET_ID_COMPUTE_BASIC, + METRIC_SET_ID_COMPUTE_EXTENDED, + METRIC_SET_ID_MEMORY_READS, + METRIC_SET_ID_MEMORY_WRITES, + METRIC_SET_ID_SAMPLER_BALANCE, }; -int i915_oa_n_builtin_metric_sets_hsw = 1; +int i915_oa_n_builtin_metric_sets_hsw = 6; static const struct i915_oa_reg b_counter_config_render_basic[] = { { _MMIO(0x2724), 0x0080 }, @@ -118,6 +123,333 @@ static int select_render_basic_config(struct drm_i915_private *dev_priv) return 0; } +static const struct i915_oa_reg b_counter_config_compute_basic[] = { + { _MMIO(0x2710), 0x }, + { _MMIO(0x2714), 0x0080 }, + { _MMIO(0x2718), 0x }, + { _MMIO(0x271c), 0x }, + { _MMIO(0x2720), 0x }, + { _MMIO(0x2724), 0x0080 }, + { _MMIO(0x2728), 0x }, + { _MMIO(0x272c), 0x }, + { _MMIO(0x2740), 0x }, + { _MMIO(0x2744), 0x }, + { _MMIO(0x2748), 0x }, + { _MMIO(0x274c), 0x }, + { _MMIO(0x2750), 0x }, + { _MMIO(0x2754), 0x }, + { _MMIO(0x2758), 0x }, + { _MMIO(0x275c), 0x }, + { _MMIO(0x236c), 0x }, +}; + +static const struct i915_oa_reg mux_config_compute_basic[] = { + { _MMIO(0x253a4), 0x }, + { _MMIO(0x2681c), 0x01f00800 }, + { _MMIO(0x26820), 0x1000 }, + { _MMIO(0x2781c), 0x01f00800 }, + { _MMIO(0x26520), 0x0007 }, + { _MMIO(0x265a0), 0x0007 }, + { _MMIO(0x25380), 0x0010 }, + { _MMIO(0x2538c), 0x0030 }, + { _MMIO(0x25384), 0xaa8a }, + { _MMIO(0x25404), 0x }, + { _MMIO(0x26800), 0x4202 }, + { _MMIO(0x26808), 0x00605817 }, + { _MMIO(0x2680c), 0x10001005 }, + { _MMIO(0x26804), 0x }, + { _MMIO(0x27800), 0x0102 }, + { _MMIO(0x27808), 0x0c0701e0 }, + { _MMIO(0x2780c), 0x000200a0 }, + { _MMIO(0x27804), 0x }, + { _MMIO(0x26484), 0x4400 }, + { _MMIO(0x26704), 0x4400 }, + { _MMIO(0x26500), 0x0006 }, + { _MMIO(0x26510), 0x0001 }, + { _MMIO(0x26504), 0x8800 }, + { _MMIO(0x26580), 0x0006 }, + { _MMIO(0x26590), 0x0020 }, + { _MMIO(0x26584), 0x }, + { _MMIO(0x26104), 0x5582 }, + { _MMIO(0x26184), 0xaa86 }, + { _MMIO(0x25420), 0x08320c83 }, + { _MMIO(0x25424), 0x06820c83 }, + { _MMIO(0x2541c), 0x }, + { _MMIO(0x25428), 0x0c03 }, +}; + +static int select_compute_basic_config(struct drm_i915_private *dev_priv) +{ + dev_priv->perf.oa.mux_regs = + mux_config_compute_basic; + dev_priv->perf.oa.mux_regs_len = + ARRAY_SIZE(mux_config_compute_basic); + + dev_priv->perf.oa.b_counter_regs = + b_counter_config_compute_basic; + dev_priv->perf.oa.b_counter_regs_len = + ARRAY_SIZE(b_counter_config_compute_basic); + + return 0; +} + +static const struct i915_oa_reg b_counter_config_compute_extended[] = { + { _MMIO(0x2724), 0xf080 }, + { _MMIO(0x2720), 0x }, + { _MMIO(0x2714), 0xf080 }, + { _MMIO(0x2710), 0x }, + { _MMIO(0x2770), 0x0007fe2a }, + { _MMIO(0x2774), 0xff00 }, + { _MMIO(0x2778), 0x0007fe6a }, + { _MMIO(0x277c), 0xff00 }, + { _MMIO(0x2780), 0x0007fe92 }, + { _MMIO(0x2784), 0xff00 }, + { _MMIO(0x2788), 0x0007fea2 }, + { _MMIO(0x278c), 0xff00 }, + { _MMIO(0x2790), 0x0007fe32 }, + { _MMIO(0x2794), 0xff00 }, + { _MMIO(0x2798), 0x0007fe9a }, + { _MMIO(0x279c), 0xff00 }, + { _MMIO(0x27a0), 0x0007ff23 }, + { _MMIO(0x27a4), 0xff00 }, + { _MMIO(0x27a8), 0x0007fff3 }, + { _MMIO(0x27ac), 0xfffe }, +}; + +static const struct i915_oa_reg mux_config_compute_extended[] = { + { _MMIO(0x2681c), 0x3eb00800 }, + { _MMIO(0x26820), 0x0090 }, + { _MMIO(0x25384), 0x02a
[Intel-gfx] [PATCH v4 07/11] drm/i915: advertise available metrics via sysfs
Each metric set is given a sysfs entry like: /sys/class/drm/card0/metrics//id This allows userspace to enumerate the specific sets that are available for the current system. The 'id' file contains an unsigned integer that can be used to open the associated metric set via DRM_IOCTL_I915_PERF_OPEN. The is a globally unique ID for a specific OA unit register configuration that can be reliably used by userspace as a key to lookup corresponding counter meta data and normalization equations. The guid registry is currently maintained as part of gputop along with the XML metric set descriptions and code generation scripts, ref: https://github.com/rib/gputop > gputop-data/guids.xml > scripts/update-guids.py > gputop-data/oa-*.xml > scripts/i915-perf-kernelgen.py $ make -C gputop-data -f Makefile.xml SYSFS=1 WHITELIST=RenderBasic Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_drv.c| 5 + drivers/gpu/drm/i915/i915_drv.h| 4 drivers/gpu/drm/i915/i915_oa_hsw.c | 45 + drivers/gpu/drm/i915/i915_oa_hsw.h | 4 drivers/gpu/drm/i915/i915_perf.c | 46 ++ 5 files changed, 104 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 92f668e..0f5f51b 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -1172,6 +1172,9 @@ static void i915_driver_register(struct drm_i915_private *dev_priv) * cannot run before the connectors are registered. */ intel_fbdev_initial_config_async(dev); + + /* Depends on sysfs having been initialized */ + i915_perf_register(dev_priv); } /** @@ -1180,6 +1183,8 @@ static void i915_driver_register(struct drm_i915_private *dev_priv) */ static void i915_driver_unregister(struct drm_i915_private *dev_priv) { + i915_perf_unregister(dev_priv); + i915_audio_component_cleanup(dev_priv); intel_gpu_ips_teardown(); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 4c302cd..dd88eb1 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2115,6 +2115,8 @@ struct drm_i915_private { struct { bool initialized; + struct kobject *metrics_kobj; + struct mutex lock; struct list_head streams; @@ -3694,6 +3696,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, /* i915_perf.c */ extern void i915_perf_init(struct drm_i915_private *dev_priv); extern void i915_perf_fini(struct drm_i915_private *dev_priv); +extern void i915_perf_register(struct drm_i915_private *dev_priv); +extern void i915_perf_unregister(struct drm_i915_private *dev_priv); /* i915_suspend.c */ extern int i915_save_state(struct drm_device *dev); diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c b/drivers/gpu/drm/i915/i915_oa_hsw.c index 3e6006ec..c32b5f8 100644 --- a/drivers/gpu/drm/i915/i915_oa_hsw.c +++ b/drivers/gpu/drm/i915/i915_oa_hsw.c @@ -24,6 +24,8 @@ * */ +#include + #include "i915_drv.h" enum metric_set_id { @@ -130,3 +132,46 @@ int i915_oa_select_metric_set_hsw(struct drm_i915_private *dev_priv) return -ENODEV; } } + +static ssize_t +show_render_basic_id(struct device *kdev, struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%d\n", METRIC_SET_ID_RENDER_BASIC); +} + +static struct device_attribute dev_attr_render_basic_id = { + .attr = { .name = "id", .mode = S_IRUGO }, + .show = show_render_basic_id, + .store = NULL, +}; + +static struct attribute *attrs_render_basic[] = { + &dev_attr_render_basic_id.attr, + NULL, +}; + +static struct attribute_group group_render_basic = { + .name = "403d8832-1a27-4aa6-a64e-f5389ce7b212", + .attrs = attrs_render_basic, +}; + +int +i915_perf_init_sysfs_hsw(struct drm_i915_private *dev_priv) +{ + int ret; + + ret = sysfs_create_group(dev_priv->perf.metrics_kobj, &group_render_basic); + if (ret) + goto error_render_basic; + + return 0; + +error_render_basic: + return ret; +} + +void +i915_perf_deinit_sysfs_hsw(struct drm_i915_private *dev_priv) +{ + sysfs_remove_group(dev_priv->perf.metrics_kobj, &group_render_basic); +} diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.h b/drivers/gpu/drm/i915/i915_oa_hsw.h index b618a1f..e4ba89d 100644 --- a/drivers/gpu/drm/i915/i915_oa_hsw.h +++ b/drivers/gpu/drm/i915/i915_oa_hsw.h @@ -31,4 +31,8 @@ extern int i915_oa_n_builtin_metric_sets_hsw; extern int i915_oa_select_metric_set_hsw(struct drm_i915_private *dev_priv); +extern int i915_perf_init_sysfs_hsw(struct drm_i915_private *dev_priv); + +extern void i915_perf_deinit_sysfs_hsw(struct drm_i915_private *dev_priv); + #endif diff --git a/drivers/gpu/drm/i915/i915_perf.c b/
[Intel-gfx] [PATCH v4 03/11] drm/i915: return EACCES for check_cmd() failures
check_cmd() is checking whether a command adheres to certain restrictions that ensure it's safe to execute within a privileged batch buffer. Returning false implies a privilege problem, not that the command is invalid. The distinction makes the difference between allowing the buffer to be executed as an unprivileged batch buffer or returning an EINVAL error to userspace without executing anything. In a case where userspace may want to test whether it can successfully write to a register that needs privileges the distinction may be important and an EINVAL error may be considered fatal. In particular this is currently true for Mesa, which includes a test for whether OACONTROL can be written too, but Mesa treats any error when flushing a batch buffer as fatal, calling exit(1). As it is currently Mesa can gracefully handle a failure to write to OACONTROL if the command parser is disabled, but if we were to remove OACONTROL from the parser's whitelist then the returned EINVAL would break Mesa applications as they attempt an OACONTROL write. Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_cmd_parser.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index cfe3e7a..71e778b 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -1261,7 +1261,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, if (!check_cmd(engine, desc, cmd, length, is_master, &oacontrol_set)) { - ret = -EINVAL; + ret = -EACCES; break; } -- 2.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v4 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit
Gen graphics hardware can be set up to periodically write snapshots of performance counters into a circular buffer via its Observation Architecture and this patch exposes that capability to userspace via the i915 perf interface. Cc: Chris Wilson Signed-off-by: Robert Bragg Signed-off-by: Zhenyu Wang --- drivers/gpu/drm/i915/i915_drv.h | 70 ++- drivers/gpu/drm/i915/i915_gem_context.c | 22 +- drivers/gpu/drm/i915/i915_perf.c| 986 +++- drivers/gpu/drm/i915/i915_reg.h | 338 +++ drivers/gpu/drm/i915/intel_ringbuffer.c | 7 +- include/uapi/drm/i915_drm.h | 70 ++- 6 files changed, 1459 insertions(+), 34 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 9070794..4c302cd 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1724,6 +1724,11 @@ struct intel_wm_config { bool sprites_scaled; }; +struct i915_oa_format { + u32 format; + int size; +}; + struct i915_oa_reg { i915_reg_t addr; u32 value; @@ -1744,11 +1749,6 @@ struct i915_perf_stream_ops { */ void (*disable)(struct i915_perf_stream *stream); - /* Return: true if any i915 perf records are ready to read() -* for this stream. -*/ - bool (*can_read)(struct i915_perf_stream *stream); - /* Call poll_wait, passing a wait queue that will be woken * once there is something ready to read() for the stream */ @@ -1758,9 +1758,7 @@ struct i915_perf_stream_ops { /* For handling a blocking read, wait until there is something * to ready to read() for the stream. E.g. wait on the same -* wait queue that would be passed to poll_wait() until -* ->can_read() returns true (if its safe to call ->can_read() -* without the i915 perf lock held). +* wait queue that would be passed to poll_wait(). */ int (*wait_unlocked)(struct i915_perf_stream *stream); @@ -1800,11 +1798,28 @@ struct i915_perf_stream { struct list_head link; u32 sample_flags; + int sample_size; struct i915_gem_context *ctx; bool enabled; - struct i915_perf_stream_ops *ops; + const struct i915_perf_stream_ops *ops; +}; + +struct i915_oa_ops { + void (*init_oa_buffer)(struct drm_i915_private *dev_priv); + int (*enable_metric_set)(struct drm_i915_private *dev_priv); + void (*disable_metric_set)(struct drm_i915_private *dev_priv); + void (*oa_enable)(struct drm_i915_private *dev_priv); + void (*oa_disable)(struct drm_i915_private *dev_priv); + void (*update_oacontrol)(struct drm_i915_private *dev_priv); + void (*update_hw_ctx_id_locked)(struct drm_i915_private *dev_priv, + u32 ctx_id); + int (*read)(struct i915_perf_stream *stream, + char __user *buf, + size_t count, + size_t *offset); + bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv); }; struct drm_i915_private { @@ -2099,16 +2114,47 @@ struct drm_i915_private { struct { bool initialized; + struct mutex lock; struct list_head streams; + spinlock_t hook_lock; + struct { - u32 metrics_set; + struct i915_perf_stream *exclusive_stream; + + u32 specific_ctx_id; + + struct hrtimer poll_check_timer; + wait_queue_head_t poll_wq; + atomic_t pollin; + + bool periodic; + int period_exponent; + int timestamp_frequency; + + int tail_margin; + + int metrics_set; const struct i915_oa_reg *mux_regs; int mux_regs_len; const struct i915_oa_reg *b_counter_regs; int b_counter_regs_len; + + struct { + struct drm_i915_gem_object *obj; + u32 gtt_offset; + u8 *addr; + int format; + int format_size; + } oa_buffer; + + u32 gen7_latched_oastatus1; + + struct i915_oa_ops ops; + const struct i915_oa_format *oa_formats; + int n_builtin_sets; } oa; } perf; @@ -3475,6 +3521,8 @@ struct drm_i915_gem_object * i915_gem_alloc_context_obj(struct drm_device *dev, size_t size); struct i915_gem_context * i915_gem_context_create_gvt(struct drm_device *dev); +int i915_gem_context_pin_legacy_rcs_state(
[Intel-gfx] [PATCH v4 11/11] drm/i915: Add a kerneldoc summary for i915_perf.c
In particular this tries to capture for posterity some of the early challenges we had with using the core perf infrastructure in case we ever want to revisit adapting perf for device metrics. Cc: Chris Wilson Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_perf.c | 163 +++ 1 file changed, 163 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 3d0ba09..2798d70 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -24,6 +24,169 @@ * Robert Bragg */ + +/** + * DOC: i915 Perf, streaming API for GPU metrics + * + * Gen graphics supports a large number of performance counters that can help + * driver and application developers understand and optimize their use of the + * GPU. + * + * This i915 perf interface enables userspace to configure and open a file + * descriptor representing a stream of GPU metrics which can then be read() as + * a stream of sample records. + * + * The interface is particularly suited to exposing buffered metrics that are + * captured by DMA from the GPU, unsynchronized with and unrelated to the CPU. + * + * Streams representing a single context are accessible to applications with a + * corresponding drm file descriptor, such that OpenGL can use the interface + * without special privileges. Access to system-wide metrics requires root + * privileges by default, unless changed via the dev.i915.perf_event_paranoid + * sysctl option. + * + * + * The interface was initially inspired by the core Perf infrastructure but + * some notable differences are: + * + * i915 perf file descriptors represent a "stream" instead of an "event"; where + * a perf event primarily corresponds to a single 64bit value, while a stream + * might sample sets of tightly-coupled counters, depending on the + * configuration. For example the Gen OA unit isn't designed to support + * orthogonal configurations of individual counters; it's configured for a set + * of related counters. Samples for an i915 perf stream capturing OA metrics + * will include a set of counter values packed in a compact HW specific format. + * The OA unit supports a number of different packing formats which can be + * selected by the user opening the stream. Perf has support for grouping + * events, but each event in the group is configured, validated and + * authenticated individually with separate system calls. + * + * i915 perf stream configurations are provided as an array of u64 (key,value) + * pairs, instead of a fixed struct with multiple miscellaneous config members, + * interleaved with event-type specific members. + * + * i915 perf doesn't support exposing metrics via an mmap'd circular buffer. + * The supported metrics are being written to memory by the GPU unsynchronized + * with the CPU, using HW specific packing formats for counter sets. Sometimes + * the constraints on HW configuration require reports to be filtered before it + * would be acceptable to expose them to unprivileged applications - to hide + * the metrics of other processes/contexts. For these use cases a read() based + * interface is a good fit, and provides an opportunity to filter data as it + * gets copied from the GPU mapped buffers to userspace buffers. + * + * + * Some notes regarding Linux Perf: + * + * + * The first prototype of this driver was based on the core perf + * infrastructure, and while we did make that mostly work, with some changes to + * perf, we found we were breaking or working around too many assumptions baked + * into perf's currently cpu centric design. + * + * In the end we didn't see a clear benefit to making perf's implementation and + * interface more complex by changing design assumptions while we knew we still + * wouldn't be able to use any existing perf based userspace tools. + * + * Also considering the Gen specific nature of the Observability hardware and + * how userspace will sometimes need to combine i915 perf OA metrics with + * side-band OA data captured via MI_REPORT_PERF_COUNT commands; we're + * expecting the interface to be used by a platform specific userspace such as + * OpenGL or tools. This is to say; we aren't inherently missing out on having + * a standard vendor/architecture agnostic interface by not using perf. + * + * + * For posterity, in case we might re-visit trying to adapt core perf to be + * better suited to exposing i915 metrics these were the main pain points we + * hit: + * + * - The perf based OA PMU driver broke some significant design assumptions: + * + * Existing perf pmus are used for profiling work on a cpu and we were + * introducing the idea of _IS_DEVICE pmus with different security + * implications, the need to fake cpu-related data (such as user/kernel + * registers) to fit with perf's current design, and adding _DEVICE records + * as a
[Intel-gfx] [PATCH v4 08/11] drm/i915: Add dev.i915.perf_event_paranoid sysctl option
Consistent with the kernel.perf_event_paranoid sysctl option that can allow non-root users to access system wide cpu metrics, this can optionally allow non-root users to access system wide OA counter metrics from Gen graphics hardware. Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_perf.c | 45 +++- 2 files changed, 45 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index dd88eb1..558cc0b 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2116,6 +2116,7 @@ struct drm_i915_private { bool initialized; struct kobject *metrics_kobj; + struct ctl_table_header *sysctl_header; struct mutex lock; struct list_head streams; diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 7e1fc6b..ac1f600 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -62,6 +62,8 @@ #define POLL_FREQUENCY 200 #define POLL_PERIOD (NSEC_PER_SEC / POLL_FREQUENCY) +static u32 i915_perf_stream_paranoid = true; + /* The maximum exponent the hardware accepts is 63 (essentially it selects one * of the 64bit timestamp bits to trigger reports from) but there's currently * no known use case for sampling as infrequently as once per 47 thousand years. @@ -1158,7 +1160,13 @@ int i915_perf_open_ioctl_locked(struct drm_device *dev, } } - if (!specific_ctx && !capable(CAP_SYS_ADMIN)) { + /* Similar to perf's kernel.perf_paranoid_cpu sysctl option +* we check a dev.i915.perf_stream_paranoid sysctl option +* to determine if it's ok to access system wide OA counters +* without CAP_SYS_ADMIN privileges. +*/ + if (!specific_ctx && + i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) { DRM_ERROR("Insufficient privileges to open system-wide i915 perf stream\n"); ret = -EACCES; goto err_ctx; @@ -1399,6 +1407,37 @@ void i915_perf_unregister(struct drm_i915_private *dev_priv) dev_priv->perf.metrics_kobj = NULL; } +static struct ctl_table oa_table[] = { + { +.procname = "perf_stream_paranoid", +.data = &i915_perf_stream_paranoid, +.maxlen = sizeof(i915_perf_stream_paranoid), +.mode = 0644, +.proc_handler = proc_dointvec, +}, + {} +}; + +static struct ctl_table i915_root[] = { + { +.procname = "i915", +.maxlen = 0, +.mode = 0555, +.child = oa_table, +}, + {} +}; + +static struct ctl_table dev_root[] = { + { +.procname = "dev", +.maxlen = 0, +.mode = 0555, +.child = i915_root, +}, + {} +}; + void i915_perf_init(struct drm_i915_private *dev_priv) { if (!IS_HASWELL(dev_priv)) @@ -1431,6 +1470,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv) dev_priv->perf.oa.n_builtin_sets = i915_oa_n_builtin_metric_sets_hsw; + dev_priv->perf.sysctl_header = register_sysctl_table(dev_root); + dev_priv->perf.initialized = true; } @@ -1439,6 +1480,8 @@ void i915_perf_fini(struct drm_i915_private *dev_priv) if (!dev_priv->perf.initialized) return; + unregister_sysctl_table(dev_priv->perf.sysctl_header); + memset(&dev_priv->perf.oa.ops, 0, sizeof(dev_priv->perf.oa.ops)); dev_priv->perf.initialized = false; } -- 2.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 07/11] drm/i915: advertise available metrics via sysfs
Each metric set is given a sysfs entry like: /sys/class/drm/card0/metrics//id This allows userspace to enumerate the specific sets that are available for the current system. The 'id' file contains an unsigned integer that can be used to open the associated metric set via DRM_IOCTL_I915_PERF_OPEN. The is a globally unique ID for a specific OA unit register configuration that can be reliably used by userspace as a key to lookup corresponding counter meta data and normalization equations. The guid registry is currently maintained as part of gputop along with the XML metric set descriptions and code generation scripts, ref: https://github.com/rib/gputop > gputop-data/guids.xml > scripts/update-guids.py > gputop-data/oa-*.xml > scripts/i915-perf-kernelgen.py $ make -C gputop-data -f Makefile.xml SYSFS=1 WHITELIST=RenderBasic Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_drv.c| 5 drivers/gpu/drm/i915/i915_drv.h| 4 +++ drivers/gpu/drm/i915/i915_oa_hsw.c | 45 + drivers/gpu/drm/i915/i915_oa_hsw.h | 4 +++ drivers/gpu/drm/i915/i915_perf.c | 52 ++ 5 files changed, 110 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 92f668e..53df16f 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -1150,6 +1150,9 @@ static void i915_driver_register(struct drm_i915_private *dev_priv) if (drm_dev_register(dev, 0) == 0) { i915_debugfs_register(dev_priv); i915_setup_sysfs(dev); + + /* Depends on sysfs having been initialized */ + i915_perf_register(dev_priv); } else DRM_ERROR("Failed to register driver for userspace access!\n"); @@ -1186,6 +1189,8 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv) acpi_video_unregister(); intel_opregion_unregister(dev_priv); + i915_perf_unregister(dev_priv); + i915_teardown_sysfs(&dev_priv->drm); i915_debugfs_unregister(dev_priv); drm_dev_unregister(&dev_priv->drm); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 4c302cd..dd88eb1 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2115,6 +2115,8 @@ struct drm_i915_private { struct { bool initialized; + struct kobject *metrics_kobj; + struct mutex lock; struct list_head streams; @@ -3694,6 +3696,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, /* i915_perf.c */ extern void i915_perf_init(struct drm_i915_private *dev_priv); extern void i915_perf_fini(struct drm_i915_private *dev_priv); +extern void i915_perf_register(struct drm_i915_private *dev_priv); +extern void i915_perf_unregister(struct drm_i915_private *dev_priv); /* i915_suspend.c */ extern int i915_save_state(struct drm_device *dev); diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c b/drivers/gpu/drm/i915/i915_oa_hsw.c index 3e6006ec..3f9dd80 100644 --- a/drivers/gpu/drm/i915/i915_oa_hsw.c +++ b/drivers/gpu/drm/i915/i915_oa_hsw.c @@ -24,6 +24,8 @@ * */ +#include + #include "i915_drv.h" enum metric_set_id { @@ -130,3 +132,46 @@ int i915_oa_select_metric_set_hsw(struct drm_i915_private *dev_priv) return -ENODEV; } } + +static ssize_t +show_render_basic_id(struct device *kdev, struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%d\n", METRIC_SET_ID_RENDER_BASIC); +} + +static struct device_attribute dev_attr_render_basic_id = { + .attr = { .name = "id", .mode = S_IRUGO }, + .show = show_render_basic_id, + .store = NULL, +}; + +static struct attribute *attrs_render_basic[] = { + &dev_attr_render_basic_id.attr, + NULL, +}; + +static struct attribute_group group_render_basic = { + .name = "403d8832-1a27-4aa6-a64e-f5389ce7b212", + .attrs = attrs_render_basic, +}; + +int +i915_perf_register_sysfs_hsw(struct drm_i915_private *dev_priv) +{ + int ret; + + ret = sysfs_create_group(dev_priv->perf.metrics_kobj, &group_render_basic); + if (ret) + goto error_render_basic; + + return 0; + +error_render_basic: + return ret; +} + +void +i915_perf_unregister_sysfs_hsw(struct drm_i915_private *dev_priv) +{ + sysfs_remove_group(dev_priv->perf.metrics_kobj, &group_render_basic); +} diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.h b/drivers/gpu/drm/i915/i915_oa_hsw.h index b618a1f..429a229 100644 --- a/drivers/gpu/drm/i915/i915_oa_hsw.h +++ b/drivers/gpu/drm/i915/i915_oa_hsw.h @@ -31,4 +31,8 @@ extern int i915_oa_n_builtin_metric_sets_hsw; extern int i915_oa_select_metric_set_hsw(struct drm_i915_private *
Re: [Intel-gfx] [PATCH v2 5/5] drm/i915: Add more OA configs for BDW, CHV, SKL + BXT
On Mon, Mar 27, 2017 at 7:16 PM, Matthew Auld < matthew.william.a...@gmail.com> wrote: > On 03/23, Robert Bragg wrote: > > These are auto generated from an XML description of metric sets, > > currently maintained in gputop, ref: > > > > https://github.com/rib/gputop > > > gputop-data/oa-*.xml > > > scripts/i915-perf-kernelgen.py > > > > $ make -C gputop-data -f Makefile.xml > > > > Signed-off-by: Robert Bragg > > --- > > > > > int i915_oa_select_metric_set_bdw(struct drm_i915_private *dev_priv) > > { > > - dev_priv->perf.oa.mux_regs = NULL; > > - dev_priv->perf.oa.mux_regs_len = 0; > > - dev_priv->perf.oa.b_counter_regs = NULL; > > - dev_priv->perf.oa.b_counter_regs_len = 0; > > - dev_priv->perf.oa.flex_regs = NULL; > > - dev_priv->perf.oa.flex_regs_len = 0; > > + dev_priv->perf.oa.mux_regs = NULL; > > + dev_priv->perf.oa.mux_regs_len = 0; > > + dev_priv->perf.oa.b_counter_regs = NULL; > > + dev_priv->perf.oa.b_counter_regs_len = 0; > > + dev_priv->perf.oa.flex_regs = NULL; > > + dev_priv->perf.oa.flex_regs_len = 0; > What changed? I can't tell from the diff... > I don't think anything changed in those lines, it's just that the diff uses the start of the function for context and then has to delete these to add the full replacement for the function body which included substantial changes to add the cases for the additional configs. > > Otherwise assuming you re-spin with the DRM_DEBUG changes: > Reviewed-by: Matthew Auld > thanks ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v3] drm/i915/perf: rate limit spurious oa report notice
Instead of initializing and summarising the number of throttled messages in the driver _init / _fini we now do this when opening / closing an OA stream. --- >8 --- This change is pre-emptively aiming to avoid a potential cause of kernel logging noise in case some condition were to result in us seeing invalid OA reports. The workaround for the OA unit's tail pointer race condition is what avoids the primary known cause of invalid reports being seen and with that in place we aren't expecting to see this notice but it can't be entirely ruled out. Just in case some condition does lead to the notice then it's likely that it will be triggered repeatedly while attempting to append a sequence of reports and depending on the configured OA sampling frequency that might be a large number of repeat notices. v2: (Chris) avoid inconsistent warning on throttle with printk_ratelimit() v3: (Matt) init and summarise with stream init/close not driver init/fini Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_drv.h | 6 ++ drivers/gpu/drm/i915/i915_perf.c | 28 +++- 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 51a410911d81..51bd6c6034bb 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2454,6 +2454,12 @@ struct drm_i915_private { wait_queue_head_t poll_wq; bool pollin; + /** +* For rate limiting any notifications of spurious +* invalid OA reports +*/ + struct ratelimit_state spurious_report_rs; + bool periodic; int period_exponent; diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 5738b99caa5b..3277a52ce98e 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -632,7 +632,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, * copying it to userspace... */ if (report32[0] == 0) { - DRM_NOTE("Skipping spurious, invalid OA report\n"); + if (__ratelimit(&dev_priv->perf.oa.spurious_report_rs)) + DRM_NOTE("Skipping spurious, invalid OA report\n"); continue; } @@ -913,6 +914,11 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream) oa_put_render_ctx_id(stream); dev_priv->perf.oa.exclusive_stream = NULL; + + if (dev_priv->perf.oa.spurious_report_rs.missed) { + DRM_NOTE("%d spurious OA report notices suppressed due to ratelimiting\n", +dev_priv->perf.oa.spurious_report_rs.missed); + } } static void gen7_init_oa_buffer(struct drm_i915_private *dev_priv) @@ -1268,6 +1274,26 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, return -EINVAL; } + /* We set up some ratelimit state to potentially throttle any _NOTES +* about spurious, invalid OA reports which we don't forward to +* userspace. +* +* The initialization is associated with opening the stream (not driver +* init) considering we print a _NOTE about any throttling when closing +* the stream instead of waiting until driver _fini which no one would +* ever see. +* +* Using the same limiting factors as printk_ratelimit() +*/ + ratelimit_state_init(&dev_priv->perf.oa.spurious_report_rs, +5 * HZ, 10); + /* Since we use a DRM_NOTE for spurious reports it would be +* inconsistent to let __ratelimit() automatically print a warning for +* throttling. +*/ + ratelimit_set_flags(&dev_priv->perf.oa.spurious_report_rs, + RATELIMIT_MSG_ON_RELEASE); + stream->sample_size = sizeof(struct drm_i915_perf_record_header); format_size = dev_priv->perf.oa.oa_formats[props->oa_format].size; -- 2.12.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v3 0/7] Enable OA unit for Gen 8 and 9 in i915 perf
Adds some R/Bs from from Matthew and some updates based on Matthew's feedback Notably the 'Add OA unit support for Gen 8+' patch now avoids duplicating lots of fiddly tail race workaround code by adding a vfunc for reading the OA tail pointer register. Robert Bragg (7): drm/i915: expose _SLICE_MASK GETPARM drm/i915: expose _SUBSLICE_MASK GETPARM drm/i915/perf: Add 'render basic' Gen8+ OA unit configs drm/i915/perf: Add OA unit support for Gen 8+ drm/i915/perf: Add more OA configs for BDW, CHV, SKL + BXT drm/i915/perf: per-gen timebase for checking sample freq drm/i915/perf: remove perf.hook_lock drivers/gpu/drm/i915/Makefile |8 +- drivers/gpu/drm/i915/i915_drv.c | 10 + drivers/gpu/drm/i915/i915_drv.h | 50 +- drivers/gpu/drm/i915/i915_gem_context.h |1 + drivers/gpu/drm/i915/i915_oa_bdw.c | 5154 +++ drivers/gpu/drm/i915/i915_oa_bdw.h | 38 + drivers/gpu/drm/i915/i915_oa_bxt.c | 2541 +++ drivers/gpu/drm/i915/i915_oa_bxt.h | 38 + drivers/gpu/drm/i915/i915_oa_chv.c | 2730 drivers/gpu/drm/i915/i915_oa_chv.h | 38 + drivers/gpu/drm/i915/i915_oa_hsw.c | 58 +- drivers/gpu/drm/i915/i915_oa_sklgt2.c | 3303 drivers/gpu/drm/i915/i915_oa_sklgt2.h | 38 + drivers/gpu/drm/i915/i915_oa_sklgt3.c | 2856 + drivers/gpu/drm/i915/i915_oa_sklgt3.h | 38 + drivers/gpu/drm/i915/i915_oa_sklgt4.c | 2910 + drivers/gpu/drm/i915/i915_oa_sklgt4.h | 38 + drivers/gpu/drm/i915/i915_perf.c| 963 +- drivers/gpu/drm/i915/i915_reg.h | 22 + drivers/gpu/drm/i915/intel_lrc.c|5 + include/uapi/drm/i915_drm.h | 21 +- 21 files changed, 20745 insertions(+), 115 deletions(-) create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.c create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.h create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.c create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.h create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.c create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.h create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.c create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.h create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.c create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.h create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.c create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.h -- 2.12.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v3 3/7] drm/i915/perf: Add 'render basic' Gen8+ OA unit configs
Adds a static OA unit, MUX, B Counter + Flex EU configurations for basic render metrics on Broadwell, Cherryview, Skylake and Broxton. These are auto generated from an XML description of metric sets, currently maintained in gputop, ref: https://github.com/rib/gputop > gputop-data/oa-*.xml > scripts/i915-perf-kernelgen.py $ make -C gputop-data -f Makefile.xml WHITELIST=RenderBasic v2: add newlines to debug messages + fix comment (Matthew Auld) Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/Makefile | 8 +- drivers/gpu/drm/i915/i915_drv.h | 2 + drivers/gpu/drm/i915/i915_oa_bdw.c| 380 ++ drivers/gpu/drm/i915/i915_oa_bdw.h| 38 drivers/gpu/drm/i915/i915_oa_bxt.c| 238 + drivers/gpu/drm/i915/i915_oa_bxt.h| 38 drivers/gpu/drm/i915/i915_oa_chv.c| 225 drivers/gpu/drm/i915/i915_oa_chv.h| 38 drivers/gpu/drm/i915/i915_oa_sklgt2.c | 228 drivers/gpu/drm/i915/i915_oa_sklgt2.h | 38 drivers/gpu/drm/i915/i915_oa_sklgt3.c | 236 + drivers/gpu/drm/i915/i915_oa_sklgt3.h | 38 drivers/gpu/drm/i915/i915_oa_sklgt4.c | 247 ++ drivers/gpu/drm/i915/i915_oa_sklgt4.h | 38 14 files changed, 1791 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.c create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.h create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.c create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.h create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.c create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.h create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.c create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.h create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.c create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.h create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.c create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 2cf04504e494..41400a138a1e 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -127,7 +127,13 @@ i915-y += i915_vgpu.o # perf code i915-y += i915_perf.o \ - i915_oa_hsw.o + i915_oa_hsw.o \ + i915_oa_bdw.o \ + i915_oa_chv.o \ + i915_oa_sklgt2.o \ + i915_oa_sklgt3.o \ + i915_oa_sklgt4.o \ + i915_oa_bxt.o ifeq ($(CONFIG_DRM_I915_GVT),y) i915-y += intel_gvt.o diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 51bd6c6034bb..9c37b73ac7ac 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2469,6 +2469,8 @@ struct drm_i915_private { int mux_regs_len; const struct i915_oa_reg *b_counter_regs; int b_counter_regs_len; + const struct i915_oa_reg *flex_regs; + int flex_regs_len; struct { struct i915_vma *vma; diff --git a/drivers/gpu/drm/i915/i915_oa_bdw.c b/drivers/gpu/drm/i915/i915_oa_bdw.c new file mode 100644 index ..b0b1b75fb431 --- /dev/null +++ b/drivers/gpu/drm/i915/i915_oa_bdw.c @@ -0,0 +1,380 @@ +/* + * Autogenerated file, DO NOT EDIT manually! + * + * Copyright (c) 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + */ + +#include + +#include "i915_drv.h" +#include "i915_oa_bdw.h" + +enum metric_set_id { + METRIC_SET_ID_RENDER_BASIC = 1, +}; + +int i915_oa_n_builtin_metric_sets_bdw = 1; + +static const struct i915_oa_reg b_counter_config_render_basic[] = { + { _MMIO(0x2710), 0x }, + { _MMIO(0x2714), 0x0
[Intel-gfx] [PATCH v3 4/7] drm/i915/perf: Add OA unit support for Gen 8+
Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all share (more-or-less) the same OA unit design. Of particular note in comparison to Haswell: some OA unit HW config state has become per-context state and as a consequence it is somewhat more complicated to manage synchronous state changes from the cpu while there's no guarantee of what context (if any) is currently actively running on the gpu. The periodic sampling frequency which can be particularly useful for system-wide analysis (as opposed to command stream synchronised MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to have become per-context save and restored (while the OABUFFER destination is still a shared, system-wide resource). This support for gen8+ takes care to consider a number of timing challenges involved in synchronously updating per-context state primarily by programming all config state from the cpu and updating all current and saved contexts synchronously while the OA unit is still disabled. The driver intentionally avoids depending on command streamer programming to update OA state considering the lack of synchronization between the automatic loading of OACTXCONTROL state (that includes the periodic sampling state and enable state) on context restore and the parsing of any general purpose BB the driver can control. I.e. this implementation is careful to avoid the possibility of a context restore temporarily enabling any out-of-date periodic sampling state. In addition to the risk of transiently-out-of-date state being loaded automatically; there are also internal HW latencies involved in the loading of MUX configurations which would be difficult to account for from the command streamer (and we only want to enable the unit when once the MUX configuration is complete). Since the Gen8+ OA unit design no longer supports clock gating the unit off for a single given context (which effectively stopped any progress of counters while any other context was running) and instead supports tagging OA reports with a context ID for filtering on the CPU, it means we can no longer hide the system-wide progress of counters from a non-privileged application only interested in metrics for its own context. Although we could theoretically try and subtract the progress of other contexts before forwarding reports via read() we aren't in a position to filter reports captured via MI_REPORT_PERF_COUNT commands. As a result, for Gen8+, we always require the dev.i915.perf_stream_paranoid to be unset for any access to OA metrics if not root. Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_drv.h | 45 +- drivers/gpu/drm/i915/i915_gem_context.h | 1 + drivers/gpu/drm/i915/i915_perf.c| 938 +--- drivers/gpu/drm/i915/i915_reg.h | 22 + drivers/gpu/drm/i915/intel_lrc.c| 5 + include/uapi/drm/i915_drm.h | 19 +- 6 files changed, 937 insertions(+), 93 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 9c37b73ac7ac..3a22b6fd0ee6 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2067,9 +2067,17 @@ struct i915_oa_ops { void (*init_oa_buffer)(struct drm_i915_private *dev_priv); /** -* @enable_metric_set: Applies any MUX configuration to set up the -* Boolean and Custom (B/C) counters that are part of the counter -* reports being sampled. May apply system constraints such as +* @select_metric_set: The auto generated code that checks whether a +* requested OA config is applicable to the system and if so sets up +* the mux, oa and flex eu register config pointers according to the +* current dev_priv->perf.oa.metrics_set. +*/ + int (*select_metric_set)(struct drm_i915_private *dev_priv); + + /** +* @enable_metric_set: Selects and applies any MUX configuration to set +* up the Boolean and Custom (B/C) counters that are part of the +* counter reports being sampled. May apply system constraints such as * disabling EU clock gating as required. */ int (*enable_metric_set)(struct drm_i915_private *dev_priv); @@ -2100,20 +2108,13 @@ struct i915_oa_ops { size_t *offset); /** -* @oa_buffer_check: Check for OA buffer data + update tail -* -* This is either called via fops or the poll check hrtimer (atomic -* ctx) without any locks taken. +* @oa_hw_tail_read: read the OA tail pointer register * -* It's safe to read OA config state here unlocked, assuming that this -* is only called while the stream is enabled, while the global OA -* configuration can't be modified. -* -* Efficiency is more important than avoiding some false positives -* here, which will be handled gracefully -
[Intel-gfx] [PATCH v3 1/7] drm/i915: expose _SLICE_MASK GETPARM
Enables userspace to determine the number of slices enabled and also know what specific slices are enabled. This information is required, for example, to be able to analyse some OA counter reports where the counter configuration depends on the HW slice configuration. Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_drv.c | 5 + include/uapi/drm/i915_drm.h | 1 + 2 files changed, 6 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 5852eed2a867..337acf034d36 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -357,6 +357,11 @@ static int i915_getparam(struct drm_device *dev, void *data, */ value = 1; break; + case I915_PARAM_SLICE_MASK: + value = INTEL_INFO(dev_priv)->sseu.slice_mask; + if (!value) + return -ENODEV; + break; default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL; diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 3554495bef13..f47fb7f26f36 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -392,6 +392,7 @@ typedef struct drm_i915_irq_wait { #define I915_PARAM_HAS_POOLED_EU38 #define I915_PARAM_MIN_EU_IN_POOL 39 #define I915_PARAM_MMAP_GTT_VERSION 40 +#define I915_PARAM_SLICE_MASK 45 /* XXX: rebase before landing */ /* Query whether DRM_I915_GEM_EXECBUFFER2 supports user defined execution * priorities and the driver will attempt to execute batches in priority order. -- 2.12.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v3 7/7] drm/i915/perf: remove perf.hook_lock
In earlier iterations of the i915-perf driver we had a number of callbacks/hooks from other parts of the i915 driver to e.g. notify us when a legacy context was pinned and these could run asynchronously with respect to the stream file operations and might also run in atomic context. dev_priv->perf.hook_lock had been for serialising access to state needed within these callbacks, but as the code has evolved some of the hooks have gone away or are implemented to avoid needing to lock any state. The remaining use of this lock was actually redundant considering how the gen7 oacontrol state used to be updated as part of a context pin hook. Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_drv.h | 2 -- drivers/gpu/drm/i915/i915_perf.c | 32 ++-- 2 files changed, 10 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 48b07d706f06..67ac4e6dbccb 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2444,8 +2444,6 @@ struct drm_i915_private { struct mutex lock; struct list_head streams; - spinlock_t hook_lock; - struct { struct i915_perf_stream *exclusive_stream; diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 87c0d1ce1b9f..63a1152766f8 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1677,9 +1677,17 @@ static void gen8_disable_metric_set(struct drm_i915_private *dev_priv) /* NOP */ } -static void gen7_update_oacontrol_locked(struct drm_i915_private *dev_priv) +static void gen7_oa_enable(struct drm_i915_private *dev_priv) { - lockdep_assert_held(&dev_priv->perf.hook_lock); + /* Reset buf pointers so we don't forward reports from before now. +* +* Think carefully if considering trying to avoid this, since it +* also ensures status flags and the buffer itself are cleared +* in error paths, and we have checks for invalid reports based +* on the assumption that certain fields are written to zeroed +* memory which this helps maintains. +*/ + gen7_init_oa_buffer(dev_priv); if (dev_priv->perf.oa.exclusive_stream->enabled) { struct i915_gem_context *ctx = @@ -1702,25 +1710,6 @@ static void gen7_update_oacontrol_locked(struct drm_i915_private *dev_priv) I915_WRITE(GEN7_OACONTROL, 0); } -static void gen7_oa_enable(struct drm_i915_private *dev_priv) -{ - unsigned long flags; - - /* Reset buf pointers so we don't forward reports from before now. -* -* Think carefully if considering trying to avoid this, since it -* also ensures status flags and the buffer itself are cleared -* in error paths, and we have checks for invalid reports based -* on the assumption that certain fields are written to zeroed -* memory which this helps maintains. -*/ - gen7_init_oa_buffer(dev_priv); - - spin_lock_irqsave(&dev_priv->perf.hook_lock, flags); - gen7_update_oacontrol_locked(dev_priv); - spin_unlock_irqrestore(&dev_priv->perf.hook_lock, flags); -} - static void gen8_oa_enable(struct drm_i915_private *dev_priv) { u32 report_format = dev_priv->perf.oa.oa_buffer.format; @@ -2999,7 +2988,6 @@ void i915_perf_init(struct drm_i915_private *dev_priv) INIT_LIST_HEAD(&dev_priv->perf.streams); mutex_init(&dev_priv->perf.lock); - spin_lock_init(&dev_priv->perf.hook_lock); spin_lock_init(&dev_priv->perf.oa.oa_buffer.ptr_lock); dev_priv->perf.sysctl_header = register_sysctl_table(dev_root); -- 2.12.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v3 2/7] drm/i915: expose _SUBSLICE_MASK GETPARM
Assuming a uniform mask across all slices, this enables userspace to determine the specific sub slices enabled. This information is required, for example, to be able to analyse some OA counter reports where the counter configuration depends on the HW sub slice configuration. Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_drv.c | 5 + include/uapi/drm/i915_drm.h | 1 + 2 files changed, 6 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 337acf034d36..e4ed70d21e91 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -362,6 +362,11 @@ static int i915_getparam(struct drm_device *dev, void *data, if (!value) return -ENODEV; break; + case I915_PARAM_SUBSLICE_MASK: + value = INTEL_INFO(dev_priv)->sseu.subslice_mask; + if (!value) + return -ENODEV; + break; default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL; diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index f47fb7f26f36..e0599e729e68 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -393,6 +393,7 @@ typedef struct drm_i915_irq_wait { #define I915_PARAM_MIN_EU_IN_POOL 39 #define I915_PARAM_MMAP_GTT_VERSION 40 #define I915_PARAM_SLICE_MASK 45 /* XXX: rebase before landing */ +#define I915_PARAM_SUBSLICE_MASK46 /* Query whether DRM_I915_GEM_EXECBUFFER2 supports user defined execution * priorities and the driver will attempt to execute batches in priority order. -- 2.12.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v3 6/7] drm/i915/perf: per-gen timebase for checking sample freq
An oa_exponent_to_ns() utility and per-gen timebase constants where recently removed when updating the tail pointer race condition WA, and this restores those so we can update the _PROP_OA_EXPONENT validation done in read_properties_unlocked() to not assume we have a 12.5KHz timebase as we did for Haswell. Signed-off-by: Robert Bragg Cc: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_perf.c | 21 +++-- 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 3a22b6fd0ee6..48b07d706f06 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2463,6 +2463,7 @@ struct drm_i915_private { bool periodic; int period_exponent; + int timestamp_frequency; int metrics_set; diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 98eb6415b63a..87c0d1ce1b9f 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -2549,6 +2549,12 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, return ret; } +static u64 oa_exponent_to_ns(struct drm_i915_private *dev_priv, int exponent) +{ + return div_u64(10ULL * (2ULL << exponent), + dev_priv->perf.oa.timestamp_frequency); +} + /** * read_properties_unlocked - validate + copy userspace stream open properties * @dev_priv: i915 device instance @@ -2647,14 +2653,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, /* Theoretically we can program the OA unit to sample * every 160ns but don't allow that by default unless * root. -* -* On Haswell the period is derived from the exponent -* as: -* -* period = 80ns * 2^(exponent + 1) */ BUILD_BUG_ON(sizeof(oa_period) != 8); - oa_period = 80ull * (2ull << value); + oa_period = oa_exponent_to_ns(dev_priv, value); /* This check is primarily to ensure that oa_period <= * UINT32_MAX (before passing to do_div which only @@ -2910,6 +2911,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv) dev_priv->perf.oa.ops.oa_hw_tail_read = gen7_oa_hw_tail_read; + dev_priv->perf.oa.timestamp_frequency = 1250; + dev_priv->perf.oa.oa_formats = hsw_oa_formats; dev_priv->perf.oa.n_builtin_sets = @@ -2923,6 +2926,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv) */ if (IS_GEN8(dev_priv)) { + dev_priv->perf.oa.timestamp_frequency = 1250; + dev_priv->perf.oa.ctx_oactxctrl_offset = 0x120; dev_priv->perf.oa.ctx_flexeu0_offset = 0x2ce; dev_priv->perf.oa.gen8_valid_ctx_bit = (1<<25); @@ -2939,6 +2944,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv) i915_oa_select_metric_set_chv; } } else if (IS_GEN9(dev_priv)) { + dev_priv->perf.oa.timestamp_frequency = 1200; + dev_priv->perf.oa.ctx_oactxctrl_offset = 0x128; dev_priv->perf.oa.ctx_flexeu0_offset = 0x3de; dev_priv->perf.oa.gen8_valid_ctx_bit = (1<<16); @@ -2959,6 +2966,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv) dev_priv->perf.oa.ops.select_metric_set = i915_oa_select_metric_set_sklgt4; } else if (IS_BROXTON(dev_priv)) { + dev_priv->perf.oa.timestamp_frequency = 19200123; + dev_priv->perf.oa.n_builtin_sets = i915_oa_n_builtin_metric_sets_bxt; dev_priv->perf.oa.ops.select_metric_set = -- 2.12.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3 6/7] drm/i915/perf: per-gen timebase for checking sample freq
On Wed, Apr 5, 2017 at 6:26 PM, Ville Syrjälä wrote: > On Wed, Apr 05, 2017 at 06:17:36PM +0100, Lionel Landwerlin wrote: > > On 05/04/17 18:06, Ville Syrjälä wrote: > > > On Wed, Apr 05, 2017 at 05:23:19PM +0100, Robert Bragg wrote: > > >> An oa_exponent_to_ns() utility and per-gen timebase constants where > > >> recently removed when updating the tail pointer race condition WA, and > > >> this restores those so we can update the _PROP_OA_EXPONENT validation > > >> done in read_properties_unlocked() to not assume we have a 12.5KHz > > >> timebase as we did for Haswell. > > >> > > >> Signed-off-by: Robert Bragg > > >> Cc: Lionel Landwerlin > > >> --- > > >> drivers/gpu/drm/i915/i915_drv.h | 1 + > > >> drivers/gpu/drm/i915/i915_perf.c | 21 +++-- > > >> 2 files changed, 16 insertions(+), 6 deletions(-) > > >> > > >> diff --git a/drivers/gpu/drm/i915/i915_drv.h > b/drivers/gpu/drm/i915/i915_drv.h > > >> index 3a22b6fd0ee6..48b07d706f06 100644 > > >> --- a/drivers/gpu/drm/i915/i915_drv.h > > >> +++ b/drivers/gpu/drm/i915/i915_drv.h > > >> @@ -2463,6 +2463,7 @@ struct drm_i915_private { > > >> > > >>bool periodic; > > >>int period_exponent; > > >> + int timestamp_frequency; > > >> > > >>int metrics_set; > > >> > > >> diff --git a/drivers/gpu/drm/i915/i915_perf.c > b/drivers/gpu/drm/i915/i915_perf.c > > >> index 98eb6415b63a..87c0d1ce1b9f 100644 > > >> --- a/drivers/gpu/drm/i915/i915_perf.c > > >> +++ b/drivers/gpu/drm/i915/i915_perf.c > > >> @@ -2549,6 +2549,12 @@ i915_perf_open_ioctl_locked(struct > drm_i915_private *dev_priv, > > >>return ret; > > >> } > > >> > > >> +static u64 oa_exponent_to_ns(struct drm_i915_private *dev_priv, int > exponent) > > >> +{ > > >> + return div_u64(10ULL * (2ULL << exponent), > > >> + dev_priv->perf.oa.timestamp_frequency); > > >> +} > > >> + > > >> /** > > >>* read_properties_unlocked - validate + copy userspace stream open > properties > > >>* @dev_priv: i915 device instance > > >> @@ -2647,14 +2653,9 @@ static int read_properties_unlocked(struct > drm_i915_private *dev_priv, > > >>/* Theoretically we can program the OA unit to > sample > > >> * every 160ns but don't allow that by default > unless > > >> * root. > > >> - * > > >> - * On Haswell the period is derived from the > exponent > > >> - * as: > > >> - * > > >> - * period = 80ns * 2^(exponent + 1) > > >> */ > > >>BUILD_BUG_ON(sizeof(oa_period) != 8); > > >> - oa_period = 80ull * (2ull << value); > > >> + oa_period = oa_exponent_to_ns(dev_priv, value); > > >> > > >>/* This check is primarily to ensure that > oa_period <= > > >> * UINT32_MAX (before passing to do_div which only > > >> @@ -2910,6 +2911,8 @@ void i915_perf_init(struct drm_i915_private > *dev_priv) > > >>dev_priv->perf.oa.ops.oa_hw_tail_read = > > >>gen7_oa_hw_tail_read; > > >> > > >> + dev_priv->perf.oa.timestamp_frequency = 1250; > > >> + > > >>dev_priv->perf.oa.oa_formats = hsw_oa_formats; > > >> > > >>dev_priv->perf.oa.n_builtin_sets = > > >> @@ -2923,6 +2926,8 @@ void i915_perf_init(struct drm_i915_private > *dev_priv) > > >> */ > > >> > > >>if (IS_GEN8(dev_priv)) { > > >> + dev_priv->perf.oa.timestamp_frequency = 1250; > > >> + > > >>dev_priv->perf.oa.ctx_oactxctrl_offset = 0x120; > > >>dev_priv->perf.oa.ctx_flexeu0_offset = 0x2ce; > > >>dev_priv->perf.oa.gen8_valid_ctx_bit = (1<<25); > > >> @@ -2939,6 +2944,8 @@ void i915_perf_init(struct drm_i
[Intel-gfx] [PATCH v2] drm/i915/perf: per-gen timebase for checking sample freq
An oa_exponent_to_ns() utility and per-gen timebase constants where recently removed when updating the tail pointer race condition WA, and this restores those so we can update the _PROP_OA_EXPONENT validation done in read_properties_unlocked() to not assume we have a 12.5MHz timebase as we did for Haswell. Accordingly the oa_sample_rate_hard_limit value that's referenced by proc_dointvec_minmax defining the absolute limit for the OA sampling frequency is now initialized to (timestamp_frequency / 2) instead of the 6.25MHz constant for Haswell. v2: Specify frequency of 19.2MHz for BXT (Ville) Initialize oa_sample_rate_hard_limit per-gen too (Lionel) Signed-off-by: Robert Bragg Cc: Lionel Landwerlin Cc: Ville Syrjälä --- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_perf.c | 31 ++- 2 files changed, 23 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 3a22b6fd0ee6..48b07d706f06 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2463,6 +2463,7 @@ struct drm_i915_private { bool periodic; int period_exponent; + int timestamp_frequency; int metrics_set; diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 98eb6415b63a..980b4a1fd7cc 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -288,10 +288,12 @@ static u32 i915_perf_stream_paranoid = true; /* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate * - * 160ns is the smallest sampling period we can theoretically program the OA - * unit with on Haswell, corresponding to 6.25MHz. + * The highest sampling frequency we can theoretically program the OA unit + * with is always half the timestamp frequency: E.g. 6.25Mhz for Haswell. + * + * Initialized just before we register the sysctl parameter. */ -static int oa_sample_rate_hard_limit = 625; +static int oa_sample_rate_hard_limit; /* Theoretically we can program the OA unit to sample every 160ns but don't * allow that by default unless root... @@ -2549,6 +2551,12 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, return ret; } +static u64 oa_exponent_to_ns(struct drm_i915_private *dev_priv, int exponent) +{ + return div_u64(10ULL * (2ULL << exponent), + dev_priv->perf.oa.timestamp_frequency); +} + /** * read_properties_unlocked - validate + copy userspace stream open properties * @dev_priv: i915 device instance @@ -2647,14 +2655,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, /* Theoretically we can program the OA unit to sample * every 160ns but don't allow that by default unless * root. -* -* On Haswell the period is derived from the exponent -* as: -* -* period = 80ns * 2^(exponent + 1) */ BUILD_BUG_ON(sizeof(oa_period) != 8); - oa_period = 80ull * (2ull << value); + oa_period = oa_exponent_to_ns(dev_priv, value); /* This check is primarily to ensure that oa_period <= * UINT32_MAX (before passing to do_div which only @@ -2910,6 +2913,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv) dev_priv->perf.oa.ops.oa_hw_tail_read = gen7_oa_hw_tail_read; + dev_priv->perf.oa.timestamp_frequency = 1250; + dev_priv->perf.oa.oa_formats = hsw_oa_formats; dev_priv->perf.oa.n_builtin_sets = @@ -2923,6 +2928,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv) */ if (IS_GEN8(dev_priv)) { + dev_priv->perf.oa.timestamp_frequency = 1250; + dev_priv->perf.oa.ctx_oactxctrl_offset = 0x120; dev_priv->perf.oa.ctx_flexeu0_offset = 0x2ce; dev_priv->perf.oa.gen8_valid_ctx_bit = (1<<25); @@ -2939,6 +2946,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv) i915_oa_select_metric_set_chv; } } else if (IS_GEN9(dev_priv)) { + dev_priv->perf.oa.timestamp_frequency = 1200; + dev_priv->perf.oa.ctx_oactxctrl_offset = 0x128; dev_priv->perf.oa.ctx_flexeu0_offset = 0x3de; dev_priv->perf.oa.gen8_valid_ctx_bit = (1<<16); @@ -2959,6 +2968,8 @@ void i915_perf
Re: [Intel-gfx] [PATCH v3 4/7] drm/i915/perf: Add OA unit support for Gen 8+
On Wed, Apr 12, 2017 at 12:33 PM, Matthew Auld wrote: > On 04/05, Robert Bragg wrote: > > Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all > > share (more-or-less) the same OA unit design. > > > > Of particular note in comparison to Haswell: some OA unit HW config > > state has become per-context state and as a consequence it is somewhat > > more complicated to manage synchronous state changes from the cpu while > > there's no guarantee of what context (if any) is currently actively > > running on the gpu. > > > > The periodic sampling frequency which can be particularly useful for > > system-wide analysis (as opposed to command stream synchronised > > MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to > > have become per-context save and restored (while the OABUFFER > > destination is still a shared, system-wide resource). > > > > This support for gen8+ takes care to consider a number of timing > > challenges involved in synchronously updating per-context state > > primarily by programming all config state from the cpu and updating all > > current and saved contexts synchronously while the OA unit is still > > disabled. > > > > The driver intentionally avoids depending on command streamer > > programming to update OA state considering the lack of synchronization > > between the automatic loading of OACTXCONTROL state (that includes the > > periodic sampling state and enable state) on context restore and the > > parsing of any general purpose BB the driver can control. I.e. this > > implementation is careful to avoid the possibility of a context restore > > temporarily enabling any out-of-date periodic sampling state. In > > addition to the risk of transiently-out-of-date state being loaded > > automatically; there are also internal HW latencies involved in the > > loading of MUX configurations which would be difficult to account for > > from the command streamer (and we only want to enable the unit when once > > the MUX configuration is complete). > > > > Since the Gen8+ OA unit design no longer supports clock gating the unit > > off for a single given context (which effectively stopped any progress > > of counters while any other context was running) and instead supports > > tagging OA reports with a context ID for filtering on the CPU, it means > > we can no longer hide the system-wide progress of counters from a > > non-privileged application only interested in metrics for its own > > context. Although we could theoretically try and subtract the progress > > of other contexts before forwarding reports via read() we aren't in a > > position to filter reports captured via MI_REPORT_PERF_COUNT commands. > > As a result, for Gen8+, we always require the > > dev.i915.perf_stream_paranoid to be unset for any access to OA metrics > > if not root. > > > > Signed-off-by: Robert Bragg > > --- > > drivers/gpu/drm/i915/i915_drv.h | 45 +- > > drivers/gpu/drm/i915/i915_gem_context.h | 1 + > > drivers/gpu/drm/i915/i915_perf.c| 938 > +--- > > drivers/gpu/drm/i915/i915_reg.h | 22 + > > drivers/gpu/drm/i915/intel_lrc.c| 5 + > > include/uapi/drm/i915_drm.h | 19 +- > > 6 files changed, 937 insertions(+), 93 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h > b/drivers/gpu/drm/i915/i915_drv.h > > index 9c37b73ac7ac..3a22b6fd0ee6 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.h > > +++ b/drivers/gpu/drm/i915/i915_drv.h > > @@ -2067,9 +2067,17 @@ struct i915_oa_ops { > > void (*init_oa_buffer)(struct drm_i915_private *dev_priv); > > > > /** > > - * @enable_metric_set: Applies any MUX configuration to set up the > > - * Boolean and Custom (B/C) counters that are part of the counter > > - * reports being sampled. May apply system constraints such as > > + * @select_metric_set: The auto generated code that checks whether > a > > + * requested OA config is applicable to the system and if so sets > up > > + * the mux, oa and flex eu register config pointers according to > the > > + * current dev_priv->perf.oa.metrics_set. > > + */ > > + int (*select_metric_set)(struct drm_i915_private *dev_priv); > > + > > + /** > > + * @enable_metric_set: Selects and applies any MUX configuration > to set > > + * up the Boolean and Custom (B/C) counters that are part of the > > + * counter reports being sampled. May apply system
Re: [Intel-gfx] [PATCH v2] drm/i915/perf: per-gen timebase for checking sample freq
On Wed, Apr 12, 2017 at 1:34 PM, Matthew Auld < matthew.william.a...@gmail.com> wrote: > On 5 April 2017 at 20:05, Robert Bragg wrote: > > An oa_exponent_to_ns() utility and per-gen timebase constants where > were > > > recently removed when updating the tail pointer race condition WA, and > > this restores those so we can update the _PROP_OA_EXPONENT validation > > done in read_properties_unlocked() to not assume we have a 12.5MHz > > timebase as we did for Haswell. > > > > Accordingly the oa_sample_rate_hard_limit value that's referenced by > > proc_dointvec_minmax defining the absolute limit for the OA sampling > > frequency is now initialized to (timestamp_frequency / 2) instead of the > > 6.25MHz constant for Haswell. > > > > v2: > > Specify frequency of 19.2MHz for BXT (Ville) > > Initialize oa_sample_rate_hard_limit per-gen too (Lionel) > > > > Signed-off-by: Robert Bragg > > Cc: Lionel Landwerlin > > Cc: Ville Syrjälä > > --- > > drivers/gpu/drm/i915/i915_drv.h | 1 + > > drivers/gpu/drm/i915/i915_perf.c | 31 ++- > > 2 files changed, 23 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h > b/drivers/gpu/drm/i915/i915_drv.h > > index 3a22b6fd0ee6..48b07d706f06 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.h > > +++ b/drivers/gpu/drm/i915/i915_drv.h > > @@ -2463,6 +2463,7 @@ struct drm_i915_private { > > > > bool periodic; > > int period_exponent; > > + int timestamp_frequency; > > > > int metrics_set; > > > > diff --git a/drivers/gpu/drm/i915/i915_perf.c > b/drivers/gpu/drm/i915/i915_perf.c > > index 98eb6415b63a..980b4a1fd7cc 100644 > > --- a/drivers/gpu/drm/i915/i915_perf.c > > +++ b/drivers/gpu/drm/i915/i915_perf.c > > @@ -288,10 +288,12 @@ static u32 i915_perf_stream_paranoid = true; > > > > /* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate > > * > > - * 160ns is the smallest sampling period we can theoretically program > the OA > > - * unit with on Haswell, corresponding to 6.25MHz. > > + * The highest sampling frequency we can theoretically program the OA > unit > > + * with is always half the timestamp frequency: E.g. 6.25Mhz for > Haswell. > > + * > > + * Initialized just before we register the sysctl parameter. > > */ > > -static int oa_sample_rate_hard_limit = 625; > > +static int oa_sample_rate_hard_limit; > > > > /* Theoretically we can program the OA unit to sample every 160ns but > don't > > * allow that by default unless root... > > @@ -2549,6 +2551,12 @@ i915_perf_open_ioctl_locked(struct > drm_i915_private *dev_priv, > > return ret; > > } > > > > +static u64 oa_exponent_to_ns(struct drm_i915_private *dev_priv, int > exponent) > > +{ > > + return div_u64(10ULL * (2ULL << exponent), > > + dev_priv->perf.oa.timestamp_frequency); > > +} > > + > > /** > > * read_properties_unlocked - validate + copy userspace stream open > properties > > * @dev_priv: i915 device instance > > @@ -2647,14 +2655,9 @@ static int read_properties_unlocked(struct > drm_i915_private *dev_priv, > > /* Theoretically we can program the OA unit to > sample > > * every 160ns but don't allow that by default > unless > hmm, that's not actually true if we consider BXT, right? > right, I've updated this comment now. > > Reviewed-by: Matthew Auld > thanks ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v4 00/15] Enable OA unit for Gen 8 and 9 in i915 perf
Updates based on latest review feedback from Matthew and Lionel and includes an update to the TestOa register config for SKL GT2 compared the last series (based on the latest XML files from VPG) Although the _[SUB]SLICE_MASK GETPARM patches were reviewed, it's worth mentioning there was a TODO comment in the last patches about rebasing the parameter ID before upstreaming which is removed now, and there's a minimal comment for what the new parameters are, consistent with other more recently added parameters. Conveniently the actual IDs didn't need rebasing so there's no last moment uapi change for gputop, mesa and igt. The series is longer just because I've included the gen7 prep patches (already reviewed) that I haven't landed yet but the gen8+ bits depend on. Regards, - Robert Robert Bragg (15): drm/i915/perf: fix gen7_append_oa_reports comment drm/i915/perf: avoid poll, read, EAGAIN busy loops drm/i915/perf: avoid read back of head register drm/i915/perf: no head/tail ref in gen7_oa_read drm/i915/perf: improve tail race workaround drm/i915/perf: improve invalid OA format debug message drm/i915/perf: better pipeline aged/aging tail updates drm/i915/perf: rate limit spurious oa report notice drm/i915: expose _SLICE_MASK GETPARM drm/i915: expose _SUBSLICE_MASK GETPARM drm/i915/perf: Add 'render basic' Gen8+ OA unit configs drm/i915/perf: Add OA unit support for Gen 8+ drm/i915/perf: Add more OA configs for BDW, CHV, SKL + BXT drm/i915/perf: per-gen timebase for checking sample freq drm/i915/perf: remove perf.hook_lock drivers/gpu/drm/i915/Makefile |8 +- drivers/gpu/drm/i915/i915_drv.c | 10 + drivers/gpu/drm/i915/i915_drv.h | 121 +- drivers/gpu/drm/i915/i915_gem_context.h |1 + drivers/gpu/drm/i915/i915_oa_bdw.c | 5154 +++ drivers/gpu/drm/i915/i915_oa_bdw.h | 38 + drivers/gpu/drm/i915/i915_oa_bxt.c | 2541 +++ drivers/gpu/drm/i915/i915_oa_bxt.h | 38 + drivers/gpu/drm/i915/i915_oa_chv.c | 2730 drivers/gpu/drm/i915/i915_oa_chv.h | 38 + drivers/gpu/drm/i915/i915_oa_hsw.c | 58 +- drivers/gpu/drm/i915/i915_oa_sklgt2.c | 3302 drivers/gpu/drm/i915/i915_oa_sklgt2.h | 38 + drivers/gpu/drm/i915/i915_oa_sklgt3.c | 2856 + drivers/gpu/drm/i915/i915_oa_sklgt3.h | 38 + drivers/gpu/drm/i915/i915_oa_sklgt4.c | 2910 + drivers/gpu/drm/i915/i915_oa_sklgt4.h | 38 + drivers/gpu/drm/i915/i915_perf.c| 1341 ++-- drivers/gpu/drm/i915/i915_reg.h | 22 + drivers/gpu/drm/i915/intel_lrc.c|5 + include/uapi/drm/i915_drm.h | 27 +- 21 files changed, 21076 insertions(+), 238 deletions(-) create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.c create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.h create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.c create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.h create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.c create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.h create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.c create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.h create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.c create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.h create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.c create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.h -- 2.12.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v4 03/15] drm/i915/perf: avoid read back of head register
There's no need for the driver to keep reading back the head pointer from hardware since the hardware doesn't update it automatically. This way we can treat any invalid head pointer value as a software/driver bug instead of spurious hardware behaviour. This change is also a small stepping stone towards re-working how the head and tail state is managed as part of an improved workaround for the tail register race condition. Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_drv.h | 11 ++ drivers/gpu/drm/i915/i915_perf.c | 46 ++-- 2 files changed, 32 insertions(+), 25 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 1af4e6f5410c..2f8a7a4f29df 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2466,6 +2466,17 @@ struct drm_i915_private { u8 *vaddr; int format; int format_size; + + /** +* Although we can always read back the head +* pointer register, we prefer to avoid +* trusting the HW state, just to avoid any +* risk that some hardware condition could +* somehow bump the head pointer unpredictably +* and cause us to forward the wrong OA buffer +* data to userspace. +*/ + u32 head; } oa_buffer; u32 gen7_latched_oastatus1; diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index f59f6dd20922..f47d1cc2144b 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -322,9 +322,8 @@ struct perf_open_properties { static bool gen7_oa_buffer_is_empty_fop_unlocked(struct drm_i915_private *dev_priv) { int report_size = dev_priv->perf.oa.oa_buffer.format_size; - u32 oastatus2 = I915_READ(GEN7_OASTATUS2); u32 oastatus1 = I915_READ(GEN7_OASTATUS1); - u32 head = oastatus2 & GEN7_OASTATUS2_HEAD_MASK; + u32 head = dev_priv->perf.oa.oa_buffer.head; u32 tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK; return OA_TAKEN(tail, head) < @@ -458,16 +457,24 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, return -EIO; head = *head_ptr - gtt_offset; + + /* An out of bounds or misaligned head pointer implies a driver bug +* since we are in full control of head pointer which should only +* be incremented by multiples of the report size (notably also +* all a power of two). +*/ + if (WARN_ONCE(head > OA_BUFFER_SIZE || head % report_size, + "Inconsistent OA buffer head pointer = %u\n", head)) + return -EIO; + tail -= gtt_offset; /* The OA unit is expected to wrap the tail pointer according to the OA -* buffer size and since we should never write a misaligned head -* pointer we don't expect to read one back either... +* buffer size */ - if (tail > OA_BUFFER_SIZE || head > OA_BUFFER_SIZE || - head % report_size) { - DRM_ERROR("Inconsistent OA buffer pointer (head = %u, tail = %u): force restart\n", - head, tail); + if (tail > OA_BUFFER_SIZE) { + DRM_ERROR("Inconsistent OA buffer tail pointer = %u: force restart\n", + tail); dev_priv->perf.oa.ops.oa_disable(dev_priv); dev_priv->perf.oa.ops.oa_enable(dev_priv); *head_ptr = I915_READ(GEN7_OASTATUS2) & @@ -562,8 +569,6 @@ static int gen7_oa_read(struct i915_perf_stream *stream, size_t *offset) { struct drm_i915_private *dev_priv = stream->dev_priv; - int report_size = dev_priv->perf.oa.oa_buffer.format_size; - u32 oastatus2; u32 oastatus1; u32 head; u32 tail; @@ -572,10 +577,9 @@ static int gen7_oa_read(struct i915_perf_stream *stream, if (WARN_ON(!dev_priv->perf.oa.oa_buffer.vaddr)) return -EIO; - oastatus2 = I915_READ(GEN7_OASTATUS2); oastatus1 = I915_READ(GEN7_OASTATUS1); - head = oastatus2 & GEN7_OASTATUS2_HEAD_MASK; + head = dev_priv->perf.oa.oa_buffer.head; tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK; /* XXX: On Haswell we don't have a safe way to clear oastatus1 @@ -616,10 +620,9 @@ static int gen7_oa_read(struct i915_perf_stream *stream, dev_priv->perf.oa.ops.oa_disab
[Intel-gfx] [PATCH v4 02/15] drm/i915/perf: avoid poll, read, EAGAIN busy loops
If the function for checking whether there is OA buffer data available (during a poll or blocking read) has false positives then we want to avoid a situation where the subsequent read() returns EAGAIN (after a more accurate check) followed by a poll() immediately reporting the same false positive POLLIN event and effectively maintaining a busy loop until there really is data. This makes sure that we clear the .pollin event status whenever we return EAGAIN to userspace which will throttle subsequent POLLIN events and repeated attempts to read to the 5ms intervals of the hrtimer callback we have. Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_perf.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 78fef53b45c9..f59f6dd20922 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1352,7 +1352,15 @@ static ssize_t i915_perf_read(struct file *file, mutex_unlock(&dev_priv->perf.lock); } - if (ret >= 0) { + /* We allow the poll checking to sometimes report false positive POLLIN +* events where we might actually report EAGAIN on read() if there's +* not really any data available. In this situation though we don't +* want to enter a busy loop between poll() reporting a POLLIN event +* and read() returning -EAGAIN. Clearing the oa.pollin state here +* effectively ensures we back off until the next hrtimer callback +* before reporting another POLLIN event. +*/ + if (ret >= 0 || ret == -EAGAIN) { /* Maybe make ->pollin per-stream state if we support multiple * concurrent streams in the future. */ -- 2.12.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v4 06/15] drm/i915/perf: improve invalid OA format debug message
A minor improvement to debugging output Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_perf.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 18734e1926b9..08cc2b0dd734 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1904,11 +1904,13 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, break; case DRM_I915_PERF_PROP_OA_FORMAT: if (value == 0 || value >= I915_OA_FORMAT_MAX) { - DRM_DEBUG("Invalid OA report format\n"); + DRM_DEBUG("Out-of-range OA report format %llu\n", + value); return -EINVAL; } if (!dev_priv->perf.oa.oa_formats[value].size) { - DRM_DEBUG("Invalid OA report format\n"); + DRM_DEBUG("Unsupported OA report format %llu\n", + value); return -EINVAL; } props->oa_format = value; -- 2.12.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v4 05/15] drm/i915/perf: improve tail race workaround
There's a HW race condition between OA unit tail pointer register updates and writes to memory whereby the tail pointer can sometimes get ahead of what's been written out to the OA buffer so far (in terms of what's visible to the CPU). Although this can be observed explicitly while copying reports to userspace by checking for a zeroed report-id field in tail reports, we want to account for this earlier, as part of the _oa_buffer_check to avoid lots of redundant read() attempts. Previously the driver used to define an effective tail pointer that lagged the real pointer by a 'tail margin' measured in bytes derived from OA_TAIL_MARGIN_NSEC and the configured sampling frequency. Unfortunately this was flawed considering that the OA unit may also automatically generate non-periodic reports (such as on context switch) or the OA unit may be enabled without any periodic sampling. This improves how we define a tail pointer for reading that lags the real tail pointer by at least %OA_TAIL_MARGIN_NSEC nanoseconds, which gives enough time for the corresponding reports to become visible to the CPU. The driver now maintains two tail pointers: 1) An 'aging' tail with an associated timestamp that is tracked until we can trust the corresponding data is visible to the CPU; at which point it is considered 'aged'. 2) An 'aged' tail that can be used for read()ing. The two separate pointers let us decouple read()s from tail pointer aging. The tail pointers are checked and updated at a limited rate within a hrtimer callback (the same callback that is used for delivering POLLIN events) and since we're now measuring the wall clock time elapsed since a given tail pointer was read the mechanism no longer cares about the OA unit's periodic sampling frequency. The natural place to handle the tail pointer updates was in gen7_oa_buffer_is_empty() which is called as part of blocking reads and the hrtimer callback used for polling, and so this was renamed to oa_buffer_check() considering the added side effect while checking whether the buffer contains data. Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_drv.h | 60 - drivers/gpu/drm/i915/i915_perf.c | 277 ++- 2 files changed, 241 insertions(+), 96 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 2f8a7a4f29df..088c4c60bd38 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2094,7 +2094,7 @@ struct i915_oa_ops { size_t *offset); /** -* @oa_buffer_is_empty: Check if OA buffer empty (false positives OK) +* @oa_buffer_check: Check for OA buffer data + update tail * * This is either called via fops or the poll check hrtimer (atomic * ctx) without any locks taken. @@ -2107,7 +2107,7 @@ struct i915_oa_ops { * here, which will be handled gracefully - likely resulting in an * %EAGAIN error for userspace. */ - bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv); + bool (*oa_buffer_check)(struct drm_i915_private *dev_priv); }; struct intel_cdclk_state { @@ -2450,9 +2450,6 @@ struct drm_i915_private { bool periodic; int period_exponent; - int timestamp_frequency; - - int tail_margin; int metrics_set; @@ -2468,6 +2465,59 @@ struct drm_i915_private { int format_size; /** +* Locks reads and writes to all head/tail state +* +* Consider: the head and tail pointer state +* needs to be read consistently from a hrtimer +* callback (atomic context) and read() fop +* (user context) with tail pointer updates +* happening in atomic context and head updates +* in user context and the (unlikely) +* possibility of read() errors needing to +* reset all head/tail state. +* +* Note: Contention or performance aren't +* currently a significant concern here +* considering the relatively low frequency of +* hrtimer callbacks (5ms period) and that +* reads typically only happen in response to a +* hrtimer event and likely complete before the +* next callback. +*
[Intel-gfx] [PATCH v4 08/15] drm/i915/perf: rate limit spurious oa report notice
This change is pre-emptively aiming to avoid a potential cause of kernel logging noise in case some condition were to result in us seeing invalid OA reports. The workaround for the OA unit's tail pointer race condition is what avoids the primary known cause of invalid reports being seen and with that in place we aren't expecting to see this notice but it can't be entirely ruled out. Just in case some condition does lead to the notice then it's likely that it will be triggered repeatedly while attempting to append a sequence of reports and depending on the configured OA sampling frequency that might be a large number of repeat notices. v2: (Chris) avoid inconsistent warning on throttle with printk_ratelimit() v3: (Matt) init and summarise with stream init/close not driver init/fini Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_drv.h | 6 ++ drivers/gpu/drm/i915/i915_perf.c | 28 +++- 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 088c4c60bd38..a0e34934a11f 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2448,6 +2448,12 @@ struct drm_i915_private { wait_queue_head_t poll_wq; bool pollin; + /** +* For rate limiting any notifications of spurious +* invalid OA reports +*/ + struct ratelimit_state spurious_report_rs; + bool periodic; int period_exponent; diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 5738b99caa5b..3277a52ce98e 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -632,7 +632,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, * copying it to userspace... */ if (report32[0] == 0) { - DRM_NOTE("Skipping spurious, invalid OA report\n"); + if (__ratelimit(&dev_priv->perf.oa.spurious_report_rs)) + DRM_NOTE("Skipping spurious, invalid OA report\n"); continue; } @@ -913,6 +914,11 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream) oa_put_render_ctx_id(stream); dev_priv->perf.oa.exclusive_stream = NULL; + + if (dev_priv->perf.oa.spurious_report_rs.missed) { + DRM_NOTE("%d spurious OA report notices suppressed due to ratelimiting\n", +dev_priv->perf.oa.spurious_report_rs.missed); + } } static void gen7_init_oa_buffer(struct drm_i915_private *dev_priv) @@ -1268,6 +1274,26 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, return -EINVAL; } + /* We set up some ratelimit state to potentially throttle any _NOTES +* about spurious, invalid OA reports which we don't forward to +* userspace. +* +* The initialization is associated with opening the stream (not driver +* init) considering we print a _NOTE about any throttling when closing +* the stream instead of waiting until driver _fini which no one would +* ever see. +* +* Using the same limiting factors as printk_ratelimit() +*/ + ratelimit_state_init(&dev_priv->perf.oa.spurious_report_rs, +5 * HZ, 10); + /* Since we use a DRM_NOTE for spurious reports it would be +* inconsistent to let __ratelimit() automatically print a warning for +* throttling. +*/ + ratelimit_set_flags(&dev_priv->perf.oa.spurious_report_rs, + RATELIMIT_MSG_ON_RELEASE); + stream->sample_size = sizeof(struct drm_i915_perf_record_header); format_size = dev_priv->perf.oa.oa_formats[props->oa_format].size; -- 2.12.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v4 04/15] drm/i915/perf: no head/tail ref in gen7_oa_read
This avoids redundantly passing an (inout) head and tail pointer to gen7_append_oa_reports() from gen7_oa_read which doesn't need to reference either itself. Moving the head/tail reads and writes into gen7_append_oa_reports should have no functional effect except to avoid some redundant head pointer writes in cases where nothing was copied to userspace. This is a stepping stone towards updating how the head and tail pointer state is managed to improve the workaround for the OA unit's tail pointer race. It reduces the number of places we need to read/write the head and tail pointers. Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_perf.c | 51 +++- 1 file changed, 19 insertions(+), 32 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index f47d1cc2144b..83dc67a635fb 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -420,8 +420,6 @@ static int append_oa_sample(struct i915_perf_stream *stream, * @buf: destination buffer given by userspace * @count: the number of bytes userspace wants to read * @offset: (inout): the current position for writing into @buf - * @head_ptr: (inout): the current oa buffer cpu read position - * @tail: the current oa buffer gpu write position * * Notably any error condition resulting in a short read (-%ENOSPC or * -%EFAULT) will be returned even though one or more records may @@ -439,9 +437,7 @@ static int append_oa_sample(struct i915_perf_stream *stream, static int gen7_append_oa_reports(struct i915_perf_stream *stream, char __user *buf, size_t count, - size_t *offset, - u32 *head_ptr, - u32 tail) + size_t *offset) { struct drm_i915_private *dev_priv = stream->dev_priv; int report_size = dev_priv->perf.oa.oa_buffer.format_size; @@ -449,14 +445,15 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, int tail_margin = dev_priv->perf.oa.tail_margin; u32 gtt_offset = i915_ggtt_offset(dev_priv->perf.oa.oa_buffer.vma); u32 mask = (OA_BUFFER_SIZE - 1); - u32 head; + size_t start_offset = *offset; + u32 head, oastatus1, tail; u32 taken; int ret = 0; if (WARN_ON(!stream->enabled)) return -EIO; - head = *head_ptr - gtt_offset; + head = dev_priv->perf.oa.oa_buffer.head - gtt_offset; /* An out of bounds or misaligned head pointer implies a driver bug * since we are in full control of head pointer which should only @@ -467,7 +464,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, "Inconsistent OA buffer head pointer = %u\n", head)) return -EIO; - tail -= gtt_offset; + oastatus1 = I915_READ(GEN7_OASTATUS1); + tail = (oastatus1 & GEN7_OASTATUS1_TAIL_MASK) - gtt_offset; /* The OA unit is expected to wrap the tail pointer according to the OA * buffer size @@ -477,8 +475,6 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, tail); dev_priv->perf.oa.ops.oa_disable(dev_priv); dev_priv->perf.oa.ops.oa_enable(dev_priv); - *head_ptr = I915_READ(GEN7_OASTATUS2) & - GEN7_OASTATUS2_HEAD_MASK; return -EIO; } @@ -542,7 +538,18 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, report32[0] = 0; } - *head_ptr = gtt_offset + head; + + if (start_offset != *offset) { + /* We removed the gtt_offset for the copy loop above, indexing +* relative to oa_buf_base so put back here... +*/ + head += gtt_offset; + + I915_WRITE(GEN7_OASTATUS2, + ((head & GEN7_OASTATUS2_HEAD_MASK) | + OA_MEM_SELECT_GGTT)); + dev_priv->perf.oa.oa_buffer.head = head; + } return ret; } @@ -570,8 +577,6 @@ static int gen7_oa_read(struct i915_perf_stream *stream, { struct drm_i915_private *dev_priv = stream->dev_priv; u32 oastatus1; - u32 head; - u32 tail; int ret; if (WARN_ON(!dev_priv->perf.oa.oa_buffer.vaddr)) @@ -579,9 +584,6 @@ static int gen7_oa_read(struct i915_perf_stream *stream, oastatus1 = I915_READ(GEN7_OASTATUS1); - head = dev_priv->perf.oa.oa_buffer.head; - tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK; - /* XXX: On Haswell we don't have a safe way to clear oastatus1 * bits while the OA unit is enabled (while the ta
[Intel-gfx] [PATCH v4 01/15] drm/i915/perf: fix gen7_append_oa_reports comment
If I'm going to complain about a back-to-front convention then the least I can do is not muddle the comment up too. Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_perf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 060b171480d5..78fef53b45c9 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -431,7 +431,7 @@ static int append_oa_sample(struct i915_perf_stream *stream, * userspace. * * Note: reports are consumed from the head, and appended to the - * tail, so the head chases the tail?... If you think that's mad + * tail, so the tail chases the head?... If you think that's mad * and back-to-front you're not alone, but this follows the * Gen PRM naming convention. * -- 2.12.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v4 07/15] drm/i915/perf: better pipeline aged/aging tail updates
This updates the tail pointer race workaround handling to updating the 'aged' pointer before looking to start aging a new one. There's the possibility that there is already new data available and so we can immediately start aging a new pointer without having to first wait for a later hrtimer callback (and then another to age). Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_perf.c | 41 ++-- 1 file changed, 23 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 08cc2b0dd734..5738b99caa5b 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -391,6 +391,29 @@ static bool gen7_oa_buffer_check_unlocked(struct drm_i915_private *dev_priv) now = ktime_get_mono_fast_ns(); + /* Update the aged tail +* +* Flip the tail pointer available for read()s once the aging tail is +* old enough to trust that the corresponding data will be visible to +* the CPU... +* +* Do this before updating the aging pointer in case we may be able to +* immediately start aging a new pointer too (if new data has become +* available) without needing to wait for a later hrtimer callback. +*/ + if (aging_tail != INVALID_TAIL_PTR && + ((now - dev_priv->perf.oa.oa_buffer.aging_timestamp) > +OA_TAIL_MARGIN_NSEC)) { + aged_idx ^= 1; + dev_priv->perf.oa.oa_buffer.aged_tail_idx = aged_idx; + + aged_tail = aging_tail; + + /* Mark that we need a new pointer to start aging... */ + dev_priv->perf.oa.oa_buffer.tails[!aged_idx].offset = INVALID_TAIL_PTR; + aging_tail = INVALID_TAIL_PTR; + } + /* Update the aging tail * * We throttle aging tail updates until we have a new tail that @@ -420,24 +443,6 @@ static bool gen7_oa_buffer_check_unlocked(struct drm_i915_private *dev_priv) } } - /* Update the aged tail -* -* Flip the tail pointer available for read()s once the aging tail is -* old enough to trust that the corresponding data will be visible to -* the CPU... -*/ - if (aging_tail != INVALID_TAIL_PTR && - ((now - dev_priv->perf.oa.oa_buffer.aging_timestamp) > -OA_TAIL_MARGIN_NSEC)) { - aged_idx ^= 1; - dev_priv->perf.oa.oa_buffer.aged_tail_idx = aged_idx; - - aged_tail = aging_tail; - - /* Mark that we need a new pointer to start aging... */ - dev_priv->perf.oa.oa_buffer.tails[!aged_idx].offset = INVALID_TAIL_PTR; - } - spin_unlock_irqrestore(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); return aged_tail == INVALID_TAIL_PTR ? -- 2.12.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v4 10/15] drm/i915: expose _SUBSLICE_MASK GETPARM
Assuming a uniform mask across all slices, this enables userspace to determine the specific sub slices enabled. This information is required, for example, to be able to analyse some OA counter reports where the counter configuration depends on the HW sub slice configuration. Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld Acked-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_drv.c | 5 + include/uapi/drm/i915_drm.h | 5 + 2 files changed, 10 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 76a724b7cc22..133ab46bf2f2 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -362,6 +362,11 @@ static int i915_getparam(struct drm_device *dev, void *data, if (!value) return -ENODEV; break; + case I915_PARAM_SUBSLICE_MASK: + value = INTEL_INFO(dev_priv)->sseu.subslice_mask; + if (!value) + return -ENODEV; + break; default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL; diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 99bfc3648454..689fd7a418a7 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -415,6 +415,11 @@ typedef struct drm_i915_irq_wait { /* Query the mask of slices available for this system */ #define I915_PARAM_SLICE_MASK 45 +/* Assuming it's uniform for each slice, this queries the mask of subslices + * per-slice for this system. + */ +#define I915_PARAM_SUBSLICE_MASK46 + typedef struct drm_i915_getparam { __s32 param; /* -- 2.12.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v4 09/15] drm/i915: expose _SLICE_MASK GETPARM
Enables userspace to determine the number of slices enabled and also know what specific slices are enabled. This information is required, for example, to be able to analyse some OA counter reports where the counter configuration depends on the HW slice configuration. Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld Acked-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_drv.c | 5 + include/uapi/drm/i915_drm.h | 3 +++ 2 files changed, 8 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index bd85e3826b72..76a724b7cc22 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -357,6 +357,11 @@ static int i915_getparam(struct drm_device *dev, void *data, */ value = 1; break; + case I915_PARAM_SLICE_MASK: + value = INTEL_INFO(dev_priv)->sseu.slice_mask; + if (!value) + return -ENODEV; + break; default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL; diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 9ee06ec8a2d6..99bfc3648454 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -412,6 +412,9 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_HAS_EXEC_FENCE 44 +/* Query the mask of slices available for this system */ +#define I915_PARAM_SLICE_MASK 45 + typedef struct drm_i915_getparam { __s32 param; /* -- 2.12.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v4 11/15] drm/i915/perf: Add 'render basic' Gen8+ OA unit configs
Adds a static OA unit, MUX, B Counter + Flex EU configurations for basic render metrics on Broadwell, Cherryview, Skylake and Broxton. These are auto generated from an XML description of metric sets, currently maintained in gputop, ref: https://github.com/rib/gputop > gputop-data/oa-*.xml > scripts/i915-perf-kernelgen.py $ make -C gputop-data -f Makefile.xml WHITELIST=RenderBasic v2: add newlines to debug messages + fix comment (Matthew Auld) Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld Acked-by: Lionel Landwerlin --- drivers/gpu/drm/i915/Makefile | 8 +- drivers/gpu/drm/i915/i915_drv.h | 2 + drivers/gpu/drm/i915/i915_oa_bdw.c| 380 ++ drivers/gpu/drm/i915/i915_oa_bdw.h| 38 drivers/gpu/drm/i915/i915_oa_bxt.c| 238 + drivers/gpu/drm/i915/i915_oa_bxt.h| 38 drivers/gpu/drm/i915/i915_oa_chv.c| 225 drivers/gpu/drm/i915/i915_oa_chv.h| 38 drivers/gpu/drm/i915/i915_oa_sklgt2.c | 228 drivers/gpu/drm/i915/i915_oa_sklgt2.h | 38 drivers/gpu/drm/i915/i915_oa_sklgt3.c | 236 + drivers/gpu/drm/i915/i915_oa_sklgt3.h | 38 drivers/gpu/drm/i915/i915_oa_sklgt4.c | 247 ++ drivers/gpu/drm/i915/i915_oa_sklgt4.h | 38 14 files changed, 1791 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.c create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.h create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.c create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.h create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.c create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.h create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.c create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.h create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.c create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.h create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.c create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 2cf04504e494..41400a138a1e 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -127,7 +127,13 @@ i915-y += i915_vgpu.o # perf code i915-y += i915_perf.o \ - i915_oa_hsw.o + i915_oa_hsw.o \ + i915_oa_bdw.o \ + i915_oa_chv.o \ + i915_oa_sklgt2.o \ + i915_oa_sklgt3.o \ + i915_oa_sklgt4.o \ + i915_oa_bxt.o ifeq ($(CONFIG_DRM_I915_GVT),y) i915-y += intel_gvt.o diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index a0e34934a11f..13b9125cacdd 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2463,6 +2463,8 @@ struct drm_i915_private { int mux_regs_len; const struct i915_oa_reg *b_counter_regs; int b_counter_regs_len; + const struct i915_oa_reg *flex_regs; + int flex_regs_len; struct { struct i915_vma *vma; diff --git a/drivers/gpu/drm/i915/i915_oa_bdw.c b/drivers/gpu/drm/i915/i915_oa_bdw.c new file mode 100644 index ..b0b1b75fb431 --- /dev/null +++ b/drivers/gpu/drm/i915/i915_oa_bdw.c @@ -0,0 +1,380 @@ +/* + * Autogenerated file, DO NOT EDIT manually! + * + * Copyright (c) 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + */ + +#include + +#include "i915_drv.h" +#include "i915_oa_bdw.h" + +enum metric_set_id { + METRIC_SET_ID_RENDER_BASIC = 1, +}; + +int i915_oa_n_builtin_metric_sets_bdw = 1; + +static const struct i915_oa_reg b_counter_config_render_basic[] = { + { _MMIO(0x2710), 0x0
[Intel-gfx] [PATCH v4 12/15] drm/i915/perf: Add OA unit support for Gen 8+
Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all share (more-or-less) the same OA unit design. Of particular note in comparison to Haswell: some OA unit HW config state has become per-context state and as a consequence it is somewhat more complicated to manage synchronous state changes from the cpu while there's no guarantee of what context (if any) is currently actively running on the gpu. The periodic sampling frequency which can be particularly useful for system-wide analysis (as opposed to command stream synchronised MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to have become per-context save and restored (while the OABUFFER destination is still a shared, system-wide resource). This support for gen8+ takes care to consider a number of timing challenges involved in synchronously updating per-context state primarily by programming all config state from the cpu and updating all current and saved contexts synchronously while the OA unit is still disabled. The driver intentionally avoids depending on command streamer programming to update OA state considering the lack of synchronization between the automatic loading of OACTXCONTROL state (that includes the periodic sampling state and enable state) on context restore and the parsing of any general purpose BB the driver can control. I.e. this implementation is careful to avoid the possibility of a context restore temporarily enabling any out-of-date periodic sampling state. In addition to the risk of transiently-out-of-date state being loaded automatically; there are also internal HW latencies involved in the loading of MUX configurations which would be difficult to account for from the command streamer (and we only want to enable the unit when once the MUX configuration is complete). Since the Gen8+ OA unit design no longer supports clock gating the unit off for a single given context (which effectively stopped any progress of counters while any other context was running) and instead supports tagging OA reports with a context ID for filtering on the CPU, it means we can no longer hide the system-wide progress of counters from a non-privileged application only interested in metrics for its own context. Although we could theoretically try and subtract the progress of other contexts before forwarding reports via read() we aren't in a position to filter reports captured via MI_REPORT_PERF_COUNT commands. As a result, for Gen8+, we always require the dev.i915.perf_stream_paranoid to be unset for any access to OA metrics if not root. Signed-off-by: Robert Bragg Acked-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_drv.h | 45 +- drivers/gpu/drm/i915/i915_gem_context.h | 1 + drivers/gpu/drm/i915/i915_perf.c| 949 +--- drivers/gpu/drm/i915/i915_reg.h | 22 + drivers/gpu/drm/i915/intel_lrc.c| 5 + include/uapi/drm/i915_drm.h | 19 +- 6 files changed, 948 insertions(+), 93 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 13b9125cacdd..b8dcf281db53 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2061,9 +2061,17 @@ struct i915_oa_ops { void (*init_oa_buffer)(struct drm_i915_private *dev_priv); /** -* @enable_metric_set: Applies any MUX configuration to set up the -* Boolean and Custom (B/C) counters that are part of the counter -* reports being sampled. May apply system constraints such as +* @select_metric_set: The auto generated code that checks whether a +* requested OA config is applicable to the system and if so sets up +* the mux, oa and flex eu register config pointers according to the +* current dev_priv->perf.oa.metrics_set. +*/ + int (*select_metric_set)(struct drm_i915_private *dev_priv); + + /** +* @enable_metric_set: Selects and applies any MUX configuration to set +* up the Boolean and Custom (B/C) counters that are part of the +* counter reports being sampled. May apply system constraints such as * disabling EU clock gating as required. */ int (*enable_metric_set)(struct drm_i915_private *dev_priv); @@ -2094,20 +2102,13 @@ struct i915_oa_ops { size_t *offset); /** -* @oa_buffer_check: Check for OA buffer data + update tail -* -* This is either called via fops or the poll check hrtimer (atomic -* ctx) without any locks taken. +* @oa_hw_tail_read: read the OA tail pointer register * -* It's safe to read OA config state here unlocked, assuming that this -* is only called while the stream is enabled, while the global OA -* configuration can't be modified. -* -* Efficiency is more important than avoiding some false positives -* here, which wil
[Intel-gfx] [PATCH v4 14/15] drm/i915/perf: per-gen timebase for checking sample freq
An oa_exponent_to_ns() utility and per-gen timebase constants where recently removed when updating the tail pointer race condition WA, and this restores those so we can update the _PROP_OA_EXPONENT validation done in read_properties_unlocked() to not assume we have a 12.5MHz timebase as we did for Haswell. Accordingly the oa_sample_rate_hard_limit value that's referenced by proc_dointvec_minmax defining the absolute limit for the OA sampling frequency is now initialized to (timestamp_frequency / 2) instead of the 6.25MHz constant for Haswell. v2: Specify frequency of 19.2MHz for BXT (Ville) Initialize oa_sample_rate_hard_limit per-gen too (Lionel) Signed-off-by: Robert Bragg Cc: Lionel Landwerlin Cc: Ville Syrjälä Reviewed-by: Matthew Auld Acked-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_perf.c | 37 ++--- 2 files changed, 27 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index b8dcf281db53..59dcce3b40a9 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2457,6 +2457,7 @@ struct drm_i915_private { bool periodic; int period_exponent; + int timestamp_frequency; int metrics_set; diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 611f996bece7..5de8d57e0b77 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -288,10 +288,12 @@ static u32 i915_perf_stream_paranoid = true; /* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate * - * 160ns is the smallest sampling period we can theoretically program the OA - * unit with on Haswell, corresponding to 6.25MHz. + * The highest sampling frequency we can theoretically program the OA unit + * with is always half the timestamp frequency: E.g. 6.25Mhz for Haswell. + * + * Initialized just before we register the sysctl parameter. */ -static int oa_sample_rate_hard_limit = 625; +static int oa_sample_rate_hard_limit; /* Theoretically we can program the OA unit to sample every 160ns but don't * allow that by default unless root... @@ -2560,6 +2562,12 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, return ret; } +static u64 oa_exponent_to_ns(struct drm_i915_private *dev_priv, int exponent) +{ + return div_u64(10ULL * (2ULL << exponent), + dev_priv->perf.oa.timestamp_frequency); +} + /** * read_properties_unlocked - validate + copy userspace stream open properties * @dev_priv: i915 device instance @@ -2656,16 +2664,13 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, } /* Theoretically we can program the OA unit to sample -* every 160ns but don't allow that by default unless -* root. -* -* On Haswell the period is derived from the exponent -* as: -* -* period = 80ns * 2^(exponent + 1) +* e.g. every 160ns for HSW, 167ns for BDW/SKL or 104ns +* for BXT. We don't allow such high sampling +* frequencies by default unless root. */ + BUILD_BUG_ON(sizeof(oa_period) != 8); - oa_period = 80ull * (2ull << value); + oa_period = oa_exponent_to_ns(dev_priv, value); /* This check is primarily to ensure that oa_period <= * UINT32_MAX (before passing to do_div which only @@ -2921,6 +2926,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv) dev_priv->perf.oa.ops.oa_hw_tail_read = gen7_oa_hw_tail_read; + dev_priv->perf.oa.timestamp_frequency = 1250; + dev_priv->perf.oa.oa_formats = hsw_oa_formats; dev_priv->perf.oa.n_builtin_sets = @@ -2934,6 +2941,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv) */ if (IS_GEN8(dev_priv)) { + dev_priv->perf.oa.timestamp_frequency = 1250; + dev_priv->perf.oa.ctx_oactxctrl_offset = 0x120; dev_priv->perf.oa.ctx_flexeu0_offset = 0x2ce; dev_priv->perf.oa.gen8_valid_ctx_bit = (1<<25); @@ -2950,6 +2959,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv) i915_oa_select_metric_set_chv; } } else if (IS_GEN9(dev_priv)) { + dev_priv->per
[Intel-gfx] [PATCH v4 15/15] drm/i915/perf: remove perf.hook_lock
In earlier iterations of the i915-perf driver we had a number of callbacks/hooks from other parts of the i915 driver to e.g. notify us when a legacy context was pinned and these could run asynchronously with respect to the stream file operations and might also run in atomic context. dev_priv->perf.hook_lock had been for serialising access to state needed within these callbacks, but as the code has evolved some of the hooks have gone away or are implemented to avoid needing to lock any state. The remaining use of this lock was actually redundant considering how the gen7 oacontrol state used to be updated as part of a context pin hook. Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld Acked-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_drv.h | 2 -- drivers/gpu/drm/i915/i915_perf.c | 32 ++-- 2 files changed, 10 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 59dcce3b40a9..94c1f5331daf 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2438,8 +2438,6 @@ struct drm_i915_private { struct mutex lock; struct list_head streams; - spinlock_t hook_lock; - struct { struct i915_perf_stream *exclusive_stream; diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 5de8d57e0b77..1f25a6690f61 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1690,9 +1690,17 @@ static void gen8_disable_metric_set(struct drm_i915_private *dev_priv) /* NOP */ } -static void gen7_update_oacontrol_locked(struct drm_i915_private *dev_priv) +static void gen7_oa_enable(struct drm_i915_private *dev_priv) { - lockdep_assert_held(&dev_priv->perf.hook_lock); + /* Reset buf pointers so we don't forward reports from before now. +* +* Think carefully if considering trying to avoid this, since it +* also ensures status flags and the buffer itself are cleared +* in error paths, and we have checks for invalid reports based +* on the assumption that certain fields are written to zeroed +* memory which this helps maintains. +*/ + gen7_init_oa_buffer(dev_priv); if (dev_priv->perf.oa.exclusive_stream->enabled) { struct i915_gem_context *ctx = @@ -1715,25 +1723,6 @@ static void gen7_update_oacontrol_locked(struct drm_i915_private *dev_priv) I915_WRITE(GEN7_OACONTROL, 0); } -static void gen7_oa_enable(struct drm_i915_private *dev_priv) -{ - unsigned long flags; - - /* Reset buf pointers so we don't forward reports from before now. -* -* Think carefully if considering trying to avoid this, since it -* also ensures status flags and the buffer itself are cleared -* in error paths, and we have checks for invalid reports based -* on the assumption that certain fields are written to zeroed -* memory which this helps maintains. -*/ - gen7_init_oa_buffer(dev_priv); - - spin_lock_irqsave(&dev_priv->perf.hook_lock, flags); - gen7_update_oacontrol_locked(dev_priv); - spin_unlock_irqrestore(&dev_priv->perf.hook_lock, flags); -} - static void gen8_oa_enable(struct drm_i915_private *dev_priv) { u32 report_format = dev_priv->perf.oa.oa_buffer.format; @@ -3014,7 +3003,6 @@ void i915_perf_init(struct drm_i915_private *dev_priv) INIT_LIST_HEAD(&dev_priv->perf.streams); mutex_init(&dev_priv->perf.lock); - spin_lock_init(&dev_priv->perf.hook_lock); spin_lock_init(&dev_priv->perf.oa.oa_buffer.ptr_lock); oa_sample_rate_hard_limit = -- 2.12.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Add i915 perf infrastructure
On Wed, Oct 12, 2016 at 12:41 PM, Joonas Lahtinen < joonas.lahti...@linux.intel.com> wrote: > On ti, 2016-10-11 at 12:03 -0700, Robert Bragg wrote: > > > > + case DRM_I915_PERF_PROP_MAX: > > > > + BUG(); > > > > > > We already handle this case above, but I guess we still need this in > > > order to silence gcc... > > > > right, and preferable to having a default: case, for the future compiler > warning to handle any new properties here. > > Please, do use MISSING_CASE instead. Daniel is known to get upset for > far less ;) > > Generally consensus is that BUG() is used only when there're no other > options to back out. > thanks for this pointer. I'll add a default: with MISSING_CASE as that looks like an i915-specific convention; though it seems like a real shame to defer missing case issues to runtime errors instead of taking advantage of the compiler complaining at build time that a case has been forgotten. Thanks, - Robert > > Regards, Joonas > -- > Joonas Lahtinen > Open Source Technology Center > Intel Corporation > ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v6 03/11] drm/i915: return EACCES for check_cmd() failures
check_cmd() is checking whether a command adheres to certain restrictions that ensure it's safe to execute within a privileged batch buffer. Returning false implies a privilege problem, not that the command is invalid. The distinction makes the difference between allowing the buffer to be executed as an unprivileged batch buffer or returning an EINVAL error to userspace without executing anything. In a case where userspace may want to test whether it can successfully write to a register that needs privileges the distinction may be important and an EINVAL error may be considered fatal. In particular this is currently true for Mesa, which includes a test for whether OACONTROL can be written too, but Mesa treats any error when flushing a batch buffer as fatal, calling exit(1). As it is currently Mesa can gracefully handle a failure to write to OACONTROL if the command parser is disabled, but if we were to remove OACONTROL from the parser's whitelist then the returned EINVAL would break Mesa applications as they attempt an OACONTROL write. This bumps the command parser version from 7 to 8, as the change is visible to userspace. Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_cmd_parser.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index fe34470..c45dd83 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -1272,7 +1272,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, if (!check_cmd(engine, desc, cmd, length, is_master, &oacontrol_set)) { - ret = -EINVAL; + ret = -EACCES; break; } @@ -1333,6 +1333,9 @@ int i915_cmd_parser_get_version(struct drm_i915_private *dev_priv) * 5. GPGPU dispatch compute indirect registers. * 6. TIMESTAMP register and Haswell CS GPR registers * 7. Allow MI_LOAD_REGISTER_REG between whitelisted registers. +* 8. Don't report cmd_check() failures as EINVAL errors to userspace; +*rely on the HW to NOOP disallowed commands as it would without +*the parser enabled. */ - return 7; + return 8; } -- 2.10.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v6 04/11] drm/i915: don't whitelist oacontrol in cmd parser
Being able to program OACONTROL from a non-privileged batch buffer is not sufficient to be able to configure the OA unit. This was originally allowed to help enable Mesa to expose OA counters via the INTEL_performance_query extension, but the current implementation based on programming OACONTROL via a batch buffer isn't able to report useable data without a more complete OA unit configuration. Mesa handles the possibility that writes to OACONTROL may not be allowed and so only advertises the extension after explicitly testing that a write to OACONTROL succeeds. Based on this; removing OACONTROL from the whitelist should be ok for userspace. Removing this simplifies adding a new kernel api for configuring the OA unit without needing to consider the possibility that userspace might trample on OACONTROL state which we'd like to start managing within the kernel instead. In particular running any Mesa based GL application currently results in clearing OACONTROL when initializing which would disable the capturing of metrics. Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_cmd_parser.c | 38 ++ 1 file changed, 2 insertions(+), 36 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index c45dd83..5152d6f 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -450,7 +450,6 @@ static const struct drm_i915_reg_descriptor gen7_render_regs[] = { REG64(PS_INVOCATION_COUNT), REG64(PS_DEPTH_COUNT), REG64_IDX(RING_TIMESTAMP, RENDER_RING_BASE), - REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below. */ REG64(MI_PREDICATE_SRC0), REG64(MI_PREDICATE_SRC1), REG32(GEN7_3DPRIM_END_OFFSET), @@ -1060,8 +1059,7 @@ bool intel_engine_needs_cmd_parser(struct intel_engine_cs *engine) static bool check_cmd(const struct intel_engine_cs *engine, const struct drm_i915_cmd_descriptor *desc, const u32 *cmd, u32 length, - const bool is_master, - bool *oacontrol_set) + const bool is_master) { if (desc->flags & CMD_DESC_SKIP) return true; @@ -1099,31 +1097,6 @@ static bool check_cmd(const struct intel_engine_cs *engine, } /* -* OACONTROL requires some special handling for -* writes. We want to make sure that any batch which -* enables OA also disables it before the end of the -* batch. The goal is to prevent one process from -* snooping on the perf data from another process. To do -* that, we need to check the value that will be written -* to the register. Hence, limit OACONTROL writes to -* only MI_LOAD_REGISTER_IMM commands. -*/ - if (reg_addr == i915_mmio_reg_offset(GEN7_OACONTROL)) { - if (desc->cmd.value == MI_LOAD_REGISTER_MEM) { - DRM_DEBUG_DRIVER("CMD: Rejected LRM to OACONTROL\n"); - return false; - } - - if (desc->cmd.value == MI_LOAD_REGISTER_REG) { - DRM_DEBUG_DRIVER("CMD: Rejected LRR to OACONTROL\n"); - return false; - } - - if (desc->cmd.value == MI_LOAD_REGISTER_IMM(1)) - *oacontrol_set = (cmd[offset + 1] != 0); - } - - /* * Check the value written to the register against the * allowed mask/value pair given in the whitelist entry. */ @@ -1214,7 +1187,6 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, u32 *cmd, *batch_end; struct drm_i915_cmd_descriptor default_desc = noop_desc; const struct drm_i915_cmd_descriptor *desc = &default_desc; - bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */ bool needs_clflush_after = false; int ret = 0; @@ -1270,8 +1242,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, break; } - if (!check_cmd(engine, desc, cmd, length, is_master, - &oacontrol_set)) { + if (!check_cmd(engine, desc, cmd, length, is_master)) { ret = -EACCES; break; } @@ -1279,11 +1250,6 @@ int intel_engine_cmd_parser(struct i
[Intel-gfx] [PATCH v6 02/11] drm/i915: rename OACONTROL GEN7_OACONTROL
OACONTROL changes quite a bit for gen8, with some bits split out into a per-context OACTXCONTROL register. Rename now before adding more gen7 OA registers Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/gvt/handlers.c| 2 +- drivers/gpu/drm/i915/i915_cmd_parser.c | 4 ++-- drivers/gpu/drm/i915/i915_reg.h| 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c index 3e74fb3..68e07a1 100644 --- a/drivers/gpu/drm/i915/gvt/handlers.c +++ b/drivers/gpu/drm/i915/gvt/handlers.c @@ -2159,7 +2159,7 @@ static int init_generic_mmio_info(struct intel_gvt *gvt) MMIO_DFH(0x1217c, D_ALL, F_CMD_ACCESS, NULL, NULL); MMIO_F(0x2290, 8, 0, 0, 0, D_HSW_PLUS, NULL, NULL); - MMIO_D(OACONTROL, D_HSW); + MMIO_D(GEN7_OACONTROL, D_HSW); MMIO_D(0x2b00, D_BDW_PLUS); MMIO_D(0x2360, D_BDW_PLUS); MMIO_F(0x5200, 32, 0, 0, 0, D_ALL, NULL, NULL); diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index f191d7b..fe34470 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -450,7 +450,7 @@ static const struct drm_i915_reg_descriptor gen7_render_regs[] = { REG64(PS_INVOCATION_COUNT), REG64(PS_DEPTH_COUNT), REG64_IDX(RING_TIMESTAMP, RENDER_RING_BASE), - REG32(OACONTROL), /* Only allowed for LRI and SRM. See below. */ + REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below. */ REG64(MI_PREDICATE_SRC0), REG64(MI_PREDICATE_SRC1), REG32(GEN7_3DPRIM_END_OFFSET), @@ -1108,7 +1108,7 @@ static bool check_cmd(const struct intel_engine_cs *engine, * to the register. Hence, limit OACONTROL writes to * only MI_LOAD_REGISTER_IMM commands. */ - if (reg_addr == i915_mmio_reg_offset(OACONTROL)) { + if (reg_addr == i915_mmio_reg_offset(GEN7_OACONTROL)) { if (desc->cmd.value == MI_LOAD_REGISTER_MEM) { DRM_DEBUG_DRIVER("CMD: Rejected LRM to OACONTROL\n"); return false; diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 00efaa1..0ad7f03 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -615,7 +615,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define HSW_CS_GPR(n) _MMIO(0x2600 + (n) * 8) #define HSW_CS_GPR_UDW(n) _MMIO(0x2600 + (n) * 8 + 4) -#define OACONTROL _MMIO(0x2360) +#define GEN7_OACONTROL _MMIO(0x2360) #define _GEN7_PIPEA_DE_LOAD_SL 0x70068 #define _GEN7_PIPEB_DE_LOAD_SL 0x71068 -- 2.10.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v6 01/11] drm/i915: Add i915 perf infrastructure
Adds base i915 perf infrastructure for Gen performance metrics. This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64 properties to configure a stream of metrics and returns a new fd usable with standard VFS system calls including read() to read typed and sized records; ioctl() to enable or disable capture and poll() to wait for data. A stream is opened something like: uint64_t properties[] = { /* Single context sampling */ DRM_I915_PERF_PROP_CTX_HANDLE,ctx_handle, /* Include OA reports in samples */ DRM_I915_PERF_PROP_SAMPLE_OA, true, /* OA unit configuration */ DRM_I915_PERF_PROP_OA_METRICS_SET,metrics_set_id, DRM_I915_PERF_PROP_OA_FORMAT, report_format, DRM_I915_PERF_PROP_OA_EXPONENT, period_exponent, }; struct drm_i915_perf_open_param parm = { .flags = I915_PERF_FLAG_FD_CLOEXEC | I915_PERF_FLAG_FD_NONBLOCK | I915_PERF_FLAG_DISABLED, .properties_ptr = (uint64_t)properties, .num_properties = sizeof(properties) / 16, }; int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, ¶m); Records read all start with a common { type, size } header with DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records contain an extensible number of fields and it's the DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that determine what's included in every sample. No specific streams are supported yet so any attempt to open a stream will return an error. v4: s/DRM_IORW/DRM_IOW/ - Emil Velikov v3: update read() interface to avoid passing state struct - Chris Wilson fix some rebase fallout, with i915-perf init/deinit v2: use i915_gem_context_get() - Chris Wilson Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/Makefile| 3 + drivers/gpu/drm/i915/i915_drv.c | 4 + drivers/gpu/drm/i915/i915_drv.h | 91 drivers/gpu/drm/i915/i915_perf.c | 443 +++ include/uapi/drm/i915_drm.h | 67 ++ 5 files changed, 608 insertions(+) create mode 100644 drivers/gpu/drm/i915/i915_perf.c diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 6123400..8d4e25f 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -113,6 +113,9 @@ i915-$(CONFIG_DRM_I915_CAPTURE_ERROR) += i915_gpu_error.o # virtual gpu code i915-y += i915_vgpu.o +# perf code +i915-y += i915_perf.o + ifeq ($(CONFIG_DRM_I915_GVT),y) i915-y += intel_gvt.o include $(src)/gvt/Makefile diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 912d534..5449579 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -836,6 +836,8 @@ static int i915_driver_init_early(struct drm_i915_private *dev_priv, intel_detect_preproduction_hw(dev_priv); + i915_perf_init(dev_priv); + return 0; err_workqueues: @@ -849,6 +851,7 @@ static int i915_driver_init_early(struct drm_i915_private *dev_priv, */ static void i915_driver_cleanup_early(struct drm_i915_private *dev_priv) { + i915_perf_fini(dev_priv); i915_gem_load_cleanup(&dev_priv->drm); i915_workqueues_cleanup(dev_priv); } @@ -2575,6 +2578,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = { DRM_IOCTL_DEF_DRV(I915_GEM_USERPTR, i915_gem_userptr_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM, i915_gem_context_getparam_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_SETPARAM, i915_gem_context_setparam_ioctl, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(I915_PERF_OPEN, i915_perf_open_ioctl, DRM_RENDER_ALLOW), }; static struct drm_driver driver = { diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 5b2b7f3..d3737c6 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1760,6 +1760,84 @@ struct intel_wm_config { bool sprites_scaled; }; +struct i915_perf_stream; + +struct i915_perf_stream_ops { + /* Enables the collection of HW samples, either in response to +* I915_PERF_IOCTL_ENABLE or implicitly called when stream is +* opened without I915_PERF_FLAG_DISABLED. +*/ + void (*enable)(struct i915_perf_stream *stream); + + /* Disables the collection of HW samples, either in response to +* I915_PERF_IOCTL_DISABLE or implicitly called before +* destroying the stream. +*/ + void (*disable)(struct i915_perf_stream *stream); + + /* Return: true if any i915 perf records are ready to read() +* for this stream. +*/ + bool (*can_read)(struct i915_perf_stream *stream); + + /* Call poll_wait, passing a wait queue that will be woken +* once there is something ready to read() for the stream +*/ + void (*poll_wait)(struct i915_pe
[Intel-gfx] [PATCH v6 05/11] drm/i915: Add 'render basic' Haswell OA unit config
Adds a static OA unit, MUX + B Counter configuration for basic render metrics on Haswell. This is auto generated from an XML description of metric sets, currently maintained in gputop, ref: https://github.com/rib/gputop > gputop-data/oa-*.xml > scripts/i915-perf-kernelgen.py $ make -C gputop-data -f Makefile.xml SYSFS=0 WHITELIST=RenderBasic Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/Makefile | 3 +- drivers/gpu/drm/i915/i915_drv.h| 14 drivers/gpu/drm/i915/i915_oa_hsw.c | 144 + drivers/gpu/drm/i915/i915_oa_hsw.h | 34 + 4 files changed, 194 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.c create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 8d4e25f..ac0c3ad 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -114,7 +114,8 @@ i915-$(CONFIG_DRM_I915_CAPTURE_ERROR) += i915_gpu_error.o i915-y += i915_vgpu.o # perf code -i915-y += i915_perf.o +i915-y += i915_perf.o \ + i915_oa_hsw.o ifeq ($(CONFIG_DRM_I915_GVT),y) i915-y += intel_gvt.o diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index d3737c6..28f3f77 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1760,6 +1760,11 @@ struct intel_wm_config { bool sprites_scaled; }; +struct i915_oa_reg { + i915_reg_t addr; + u32 value; +}; + struct i915_perf_stream; struct i915_perf_stream_ops { @@ -2142,6 +2147,15 @@ struct drm_i915_private { bool initialized; struct mutex lock; struct list_head streams; + + struct { + u32 metrics_set; + + const struct i915_oa_reg *mux_regs; + int mux_regs_len; + const struct i915_oa_reg *b_counter_regs; + int b_counter_regs_len; + } oa; } perf; /* Abstract the submission mechanism (legacy ringbuffer or execlists) away */ diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c b/drivers/gpu/drm/i915/i915_oa_hsw.c new file mode 100644 index 000..8906380 --- /dev/null +++ b/drivers/gpu/drm/i915/i915_oa_hsw.c @@ -0,0 +1,144 @@ +/* + * Autogenerated file, DO NOT EDIT manually! + * + * Copyright (c) 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + */ + +#include "i915_drv.h" +#include "i915_oa_hsw.h" + +enum metric_set_id { + METRIC_SET_ID_RENDER_BASIC = 1, +}; + +int i915_oa_n_builtin_metric_sets_hsw = 1; + +static const struct i915_oa_reg b_counter_config_render_basic[] = { + { _MMIO(0x2724), 0x0080 }, + { _MMIO(0x2720), 0x }, + { _MMIO(0x2714), 0x0080 }, + { _MMIO(0x2710), 0x }, +}; + +static const struct i915_oa_reg mux_config_render_basic[] = { + { _MMIO(0x253a4), 0x0160 }, + { _MMIO(0x25440), 0x0010 }, + { _MMIO(0x25128), 0x }, + { _MMIO(0x2691c), 0x0800 }, + { _MMIO(0x26aa0), 0x0150 }, + { _MMIO(0x26b9c), 0x6000 }, + { _MMIO(0x2791c), 0x0800 }, + { _MMIO(0x27aa0), 0x0150 }, + { _MMIO(0x27b9c), 0x6000 }, + { _MMIO(0x2641c), 0x0400 }, + { _MMIO(0x25380), 0x0010 }, + { _MMIO(0x2538c), 0x }, + { _MMIO(0x25384), 0x0800 }, + { _MMIO(0x25400), 0x0004 }, + { _MMIO(0x2540c), 0x06029000 }, + { _MMIO(0x25410), 0x0002 }, + { _MMIO(0x25404), 0x5c30 }, + { _MMIO(0x25100), 0x0016 }, + { _MMIO(0x25110), 0x0400 }, + { _MMIO(0x25104), 0x }, + { _MMIO(0x26804), 0x1211 }, + { _MMIO(0
[Intel-gfx] [PATCH v6 09/11] drm/i915: add oa_event_min_timer_exponent sysctl
The minimal sampling period is now configurable via a dev.i915.oa_min_timer_exponent sysctl parameter. Following the precedent set by perf, the default is the minimum that won't (on its own) exceed the default kernel.perf_event_max_sample_rate default of 10 samples/s. Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_perf.c | 41 1 file changed, 29 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 1d61731..4e985dd 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -82,6 +82,22 @@ static u32 i915_perf_stream_paranoid = true; #define INVALID_CTX_ID 0x +/* for sysctl proc_dointvec_minmax of i915_oa_min_timer_exponent */ +static int oa_exponent_max = OA_EXPONENT_MAX; + +/* Theoretically we can program the OA unit to sample every 160ns but don't + * allow that by default unless root... + * + * The period is derived from the exponent as: + * + * period = 80ns * 2^(exponent + 1) + * + * Referring to perf's kernel.perf_event_max_sample_rate for a precedent + * (10 by default); with an OA exponent of 6 we get a period of 10.240 + * microseconds - just under 10Hz + */ +static u32 i915_oa_min_timer_exponent = 6; + /* XXX: beware if future OA HW adds new report formats that the current * code assumes all reports have a power-of-two size and ~(size - 1) can * be used as a mask to align the OA tail pointer. @@ -1349,21 +1365,13 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, return -EINVAL; } - /* NB: The exponent represents a period as follows: -* -* 80ns * 2^(period_exponent + 1) -* -* Theoretically we can program the OA unit to sample + /* Theoretically we can program the OA unit to sample * every 160ns but don't allow that by default unless * root. -* -* Referring to perf's -* kernel.perf_event_max_sample_rate for a precedent -* (10 by default); with an OA exponent of 6 we get -* a period of 10.240 microseconds -just under 10Hz */ - if (value < 6 && !capable(CAP_SYS_ADMIN)) { - DRM_ERROR("Sampling period too high without root privileges\n"); + if (value < i915_oa_min_timer_exponent && + !capable(CAP_SYS_ADMIN)) { + DRM_ERROR("OA timer exponent too low without root privileges\n"); return -EACCES; } @@ -1471,6 +1479,15 @@ static struct ctl_table oa_table[] = { .extra1 = &zero, .extra2 = &one, }, + { +.procname = "oa_min_timer_exponent", +.data = &i915_oa_min_timer_exponent, +.maxlen = sizeof(i915_oa_min_timer_exponent), +.mode = 0644, +.proc_handler = proc_dointvec_minmax, +.extra1 = &zero, +.extra2 = &oa_exponent_max, +}, {} }; -- 2.10.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v6 10/11] drm/i915: Add more Haswell OA metric sets
This adds 'compute', 'compute extended', 'memory reads', 'memory writes' and 'sampler balance' metric sets for Haswell. The code is auto generated from an XML description of metric sets, currently maintained in gputop, ref: https://github.com/rib/gputop > gputop-data/oa-*.xml > scripts/i915-perf-kernelgen.py $ make -C gputop-data -f Makefile.xml Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_oa_hsw.c | 559 - 1 file changed, 558 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c b/drivers/gpu/drm/i915/i915_oa_hsw.c index 19f272b..cd2a23a 100644 --- a/drivers/gpu/drm/i915/i915_oa_hsw.c +++ b/drivers/gpu/drm/i915/i915_oa_hsw.c @@ -31,9 +31,14 @@ enum metric_set_id { METRIC_SET_ID_RENDER_BASIC = 1, + METRIC_SET_ID_COMPUTE_BASIC, + METRIC_SET_ID_COMPUTE_EXTENDED, + METRIC_SET_ID_MEMORY_READS, + METRIC_SET_ID_MEMORY_WRITES, + METRIC_SET_ID_SAMPLER_BALANCE, }; -int i915_oa_n_builtin_metric_sets_hsw = 1; +int i915_oa_n_builtin_metric_sets_hsw = 6; static const struct i915_oa_reg b_counter_config_render_basic[] = { { _MMIO(0x2724), 0x0080 }, @@ -112,6 +117,298 @@ get_render_basic_mux_config(struct drm_i915_private *dev_priv, return mux_config_render_basic; } +static const struct i915_oa_reg b_counter_config_compute_basic[] = { + { _MMIO(0x2710), 0x }, + { _MMIO(0x2714), 0x0080 }, + { _MMIO(0x2718), 0x }, + { _MMIO(0x271c), 0x }, + { _MMIO(0x2720), 0x }, + { _MMIO(0x2724), 0x0080 }, + { _MMIO(0x2728), 0x }, + { _MMIO(0x272c), 0x }, + { _MMIO(0x2740), 0x }, + { _MMIO(0x2744), 0x }, + { _MMIO(0x2748), 0x }, + { _MMIO(0x274c), 0x }, + { _MMIO(0x2750), 0x }, + { _MMIO(0x2754), 0x }, + { _MMIO(0x2758), 0x }, + { _MMIO(0x275c), 0x }, + { _MMIO(0x236c), 0x }, +}; + +static const struct i915_oa_reg mux_config_compute_basic[] = { + { _MMIO(0x253a4), 0x }, + { _MMIO(0x2681c), 0x01f00800 }, + { _MMIO(0x26820), 0x1000 }, + { _MMIO(0x2781c), 0x01f00800 }, + { _MMIO(0x26520), 0x0007 }, + { _MMIO(0x265a0), 0x0007 }, + { _MMIO(0x25380), 0x0010 }, + { _MMIO(0x2538c), 0x0030 }, + { _MMIO(0x25384), 0xaa8a }, + { _MMIO(0x25404), 0x }, + { _MMIO(0x26800), 0x4202 }, + { _MMIO(0x26808), 0x00605817 }, + { _MMIO(0x2680c), 0x10001005 }, + { _MMIO(0x26804), 0x }, + { _MMIO(0x27800), 0x0102 }, + { _MMIO(0x27808), 0x0c0701e0 }, + { _MMIO(0x2780c), 0x000200a0 }, + { _MMIO(0x27804), 0x }, + { _MMIO(0x26484), 0x4400 }, + { _MMIO(0x26704), 0x4400 }, + { _MMIO(0x26500), 0x0006 }, + { _MMIO(0x26510), 0x0001 }, + { _MMIO(0x26504), 0x8800 }, + { _MMIO(0x26580), 0x0006 }, + { _MMIO(0x26590), 0x0020 }, + { _MMIO(0x26584), 0x }, + { _MMIO(0x26104), 0x5582 }, + { _MMIO(0x26184), 0xaa86 }, + { _MMIO(0x25420), 0x08320c83 }, + { _MMIO(0x25424), 0x06820c83 }, + { _MMIO(0x2541c), 0x }, + { _MMIO(0x25428), 0x0c03 }, +}; + +static const struct i915_oa_reg * +get_compute_basic_mux_config(struct drm_i915_private *dev_priv, +int *len) +{ + *len = ARRAY_SIZE(mux_config_compute_basic); + return mux_config_compute_basic; +} + +static const struct i915_oa_reg b_counter_config_compute_extended[] = { + { _MMIO(0x2724), 0xf080 }, + { _MMIO(0x2720), 0x }, + { _MMIO(0x2714), 0xf080 }, + { _MMIO(0x2710), 0x }, + { _MMIO(0x2770), 0x0007fe2a }, + { _MMIO(0x2774), 0xff00 }, + { _MMIO(0x2778), 0x0007fe6a }, + { _MMIO(0x277c), 0xff00 }, + { _MMIO(0x2780), 0x0007fe92 }, + { _MMIO(0x2784), 0xff00 }, + { _MMIO(0x2788), 0x0007fea2 }, + { _MMIO(0x278c), 0xff00 }, + { _MMIO(0x2790), 0x0007fe32 }, + { _MMIO(0x2794), 0xff00 }, + { _MMIO(0x2798), 0x0007fe9a }, + { _MMIO(0x279c), 0xff00 }, + { _MMIO(0x27a0), 0x0007ff23 }, + { _MMIO(0x27a4), 0xff00 }, + { _MMIO(0x27a8), 0x0007fff3 }, + { _MMIO(0x27ac), 0xfffe }, +}; + +static const struct i915_oa_reg mux_config_compute_extended[] = { + { _MMIO(0x2681c), 0x3eb00800 }, + { _MMIO(0x26820), 0x0090 }, + { _MMIO(0x25384), 0x02aa }, + { _MMIO(0x25404), 0x03ff }, + { _MMIO(0x26800), 0x00142284 }, + { _MMIO(0x26808), 0x0e629062 }, + { _MMIO(0x2680c), 0x3f6f55cb }, + { _MMIO(0x26810), 0x0014 }, +
[Intel-gfx] [PATCH v6 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit
Gen graphics hardware can be set up to periodically write snapshots of performance counters into a circular buffer via its Observation Architecture and this patch exposes that capability to userspace via the i915 perf interface. v2: Make sure to initialize ->specific_ctx_id when opening, without relying on _pin_notify hook, in case ctx already pinned. Cc: Chris Wilson Signed-off-by: Robert Bragg Signed-off-by: Zhenyu Wang factor out init_specific_ctx_id func --- drivers/gpu/drm/i915/i915_drv.h | 72 ++- drivers/gpu/drm/i915/i915_gem_context.c | 22 +- drivers/gpu/drm/i915/i915_perf.c| 1034 ++- drivers/gpu/drm/i915/i915_reg.h | 338 ++ drivers/gpu/drm/i915/intel_ringbuffer.c | 11 +- include/uapi/drm/i915_drm.h | 70 ++- 6 files changed, 1515 insertions(+), 32 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 28f3f77..b234412 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1760,6 +1760,11 @@ struct intel_wm_config { bool sprites_scaled; }; +struct i915_oa_format { + u32 format; + int size; +}; + struct i915_oa_reg { i915_reg_t addr; u32 value; @@ -1780,11 +1785,6 @@ struct i915_perf_stream_ops { */ void (*disable)(struct i915_perf_stream *stream); - /* Return: true if any i915 perf records are ready to read() -* for this stream. -*/ - bool (*can_read)(struct i915_perf_stream *stream); - /* Call poll_wait, passing a wait queue that will be woken * once there is something ready to read() for the stream */ @@ -1794,9 +1794,7 @@ struct i915_perf_stream_ops { /* For handling a blocking read, wait until there is something * to ready to read() for the stream. E.g. wait on the same -* wait queue that would be passed to poll_wait() until -* ->can_read() returns true (if its safe to call ->can_read() -* without the i915 perf lock held). +* wait queue that would be passed to poll_wait(). */ int (*wait_unlocked)(struct i915_perf_stream *stream); @@ -1836,11 +1834,28 @@ struct i915_perf_stream { struct list_head link; u32 sample_flags; + int sample_size; struct i915_gem_context *ctx; bool enabled; - struct i915_perf_stream_ops *ops; + const struct i915_perf_stream_ops *ops; +}; + +struct i915_oa_ops { + void (*init_oa_buffer)(struct drm_i915_private *dev_priv); + int (*enable_metric_set)(struct drm_i915_private *dev_priv); + void (*disable_metric_set)(struct drm_i915_private *dev_priv); + void (*oa_enable)(struct drm_i915_private *dev_priv); + void (*oa_disable)(struct drm_i915_private *dev_priv); + void (*update_oacontrol)(struct drm_i915_private *dev_priv); + void (*update_hw_ctx_id_locked)(struct drm_i915_private *dev_priv, + u32 ctx_id); + int (*read)(struct i915_perf_stream *stream, + char __user *buf, + size_t count, + size_t *offset); + bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv); }; struct drm_i915_private { @@ -2145,16 +2160,48 @@ struct drm_i915_private { struct { bool initialized; + struct mutex lock; struct list_head streams; + spinlock_t hook_lock; + struct { - u32 metrics_set; + struct i915_perf_stream *exclusive_stream; + + u32 specific_ctx_id; + + struct hrtimer poll_check_timer; + wait_queue_head_t poll_wq; + atomic_t pollin; + + bool periodic; + int period_exponent; + int timestamp_frequency; + + int tail_margin; + + int metrics_set; const struct i915_oa_reg *mux_regs; int mux_regs_len; const struct i915_oa_reg *b_counter_regs; int b_counter_regs_len; + + struct { + struct drm_i915_gem_object *obj; + struct i915_vma *vma; + u32 gtt_offset; + u8 *addr; + int format; + int format_size; + } oa_buffer; + + u32 gen7_latched_oastatus1; + + struct i915_oa_ops ops; + const struct i915_oa_format *oa_formats; + int n_builtin_sets; } oa; } perf; @@ -3525,6 +3572,9
[Intel-gfx] [PATCH v6 07/11] drm/i915: advertise available metrics via sysfs
Each metric set is given a sysfs entry like: /sys/class/drm/card0/metrics//id This allows userspace to enumerate the specific sets that are available for the current system. The 'id' file contains an unsigned integer that can be used to open the associated metric set via DRM_IOCTL_I915_PERF_OPEN. The is a globally unique ID for a specific OA unit register configuration that can be reliably used by userspace as a key to lookup corresponding counter meta data and normalization equations. The guid registry is currently maintained as part of gputop along with the XML metric set descriptions and code generation scripts, ref: https://github.com/rib/gputop > gputop-data/guids.xml > scripts/update-guids.py > gputop-data/oa-*.xml > scripts/i915-perf-kernelgen.py $ make -C gputop-data -f Makefile.xml SYSFS=1 WHITELIST=RenderBasic Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_drv.c| 5 drivers/gpu/drm/i915/i915_drv.h| 4 +++ drivers/gpu/drm/i915/i915_oa_hsw.c | 51 + drivers/gpu/drm/i915/i915_oa_hsw.h | 4 +++ drivers/gpu/drm/i915/i915_perf.c | 52 ++ 5 files changed, 116 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 5449579..3b6f586 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -1115,6 +1115,9 @@ static void i915_driver_register(struct drm_i915_private *dev_priv) if (drm_dev_register(dev, 0) == 0) { i915_debugfs_register(dev_priv); i915_setup_sysfs(dev_priv); + + /* Depends on sysfs having been initialized */ + i915_perf_register(dev_priv); } else DRM_ERROR("Failed to register driver for userspace access!\n"); @@ -1151,6 +1154,8 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv) acpi_video_unregister(); intel_opregion_unregister(dev_priv); + i915_perf_unregister(dev_priv); + i915_teardown_sysfs(dev_priv); i915_debugfs_unregister(dev_priv); drm_dev_unregister(&dev_priv->drm); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index b234412..3b86427 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2161,6 +2161,8 @@ struct drm_i915_private { struct { bool initialized; + struct kobject *metrics_kobj; + struct mutex lock; struct list_head streams; @@ -3752,6 +3754,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, /* i915_perf.c */ extern void i915_perf_init(struct drm_i915_private *dev_priv); extern void i915_perf_fini(struct drm_i915_private *dev_priv); +extern void i915_perf_register(struct drm_i915_private *dev_priv); +extern void i915_perf_unregister(struct drm_i915_private *dev_priv); /* i915_suspend.c */ extern int i915_save_state(struct drm_device *dev); diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c b/drivers/gpu/drm/i915/i915_oa_hsw.c index 8906380..19f272b 100644 --- a/drivers/gpu/drm/i915/i915_oa_hsw.c +++ b/drivers/gpu/drm/i915/i915_oa_hsw.c @@ -24,6 +24,8 @@ * */ +#include + #include "i915_drv.h" #include "i915_oa_hsw.h" @@ -142,3 +144,52 @@ int i915_oa_select_metric_set_hsw(struct drm_i915_private *dev_priv) return -ENODEV; } } + +static ssize_t +show_render_basic_id(struct device *kdev, struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%d\n", METRIC_SET_ID_RENDER_BASIC); +} + +static struct device_attribute dev_attr_render_basic_id = { + .attr = { .name = "id", .mode = S_IRUGO }, + .show = show_render_basic_id, + .store = NULL, +}; + +static struct attribute *attrs_render_basic[] = { + &dev_attr_render_basic_id.attr, + NULL, +}; + +static struct attribute_group group_render_basic = { + .name = "403d8832-1a27-4aa6-a64e-f5389ce7b212", + .attrs = attrs_render_basic, +}; + +int +i915_perf_register_sysfs_hsw(struct drm_i915_private *dev_priv) +{ + int mux_len; + int ret = 0; + + if (get_render_basic_mux_config(dev_priv, &mux_len)) { + ret = sysfs_create_group(dev_priv->perf.metrics_kobj, &group_render_basic); + if (ret) + goto error_render_basic; + } + + return 0; + +error_render_basic: + return ret; +} + +void +i915_perf_unregister_sysfs_hsw(struct drm_i915_private *dev_priv) +{ + int mux_len; + + if (get_render_basic_mux_config(dev_priv, &mux_len)) + sysfs_remove_group(dev_priv->perf.metrics_kobj, &group_render_basic); +} diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.h b/drivers/gpu/drm/i915/i915_oa_hsw.h index b618a
[Intel-gfx] [PATCH v6 08/11] drm/i915: Add dev.i915.perf_stream_paranoid sysctl option
Consistent with the kernel.perf_event_paranoid sysctl option that can allow non-root users to access system wide cpu metrics, this can optionally allow non-root users to access system wide OA counter metrics from Gen graphics hardware. Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_perf.c | 50 +++- 2 files changed, 50 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 3b86427..66629bc 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2162,6 +2162,7 @@ struct drm_i915_private { bool initialized; struct kobject *metrics_kobj; + struct ctl_table_header *sysctl_header; struct mutex lock; struct list_head streams; diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index c45bba5..1d61731 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -64,6 +64,11 @@ #define POLL_FREQUENCY 200 #define POLL_PERIOD (NSEC_PER_SEC / POLL_FREQUENCY) +/* for sysctl proc_dointvec_minmax of dev.i915.perf_stream_paranoid */ +static int zero; +static int one = 1; +static u32 i915_perf_stream_paranoid = true; + /* The maximum exponent the hardware accepts is 63 (essentially it selects one * of the 64bit timestamp bits to trigger reports from) but there's currently * no known use case for sampling as infrequently as once per 47 thousand years. @@ -1206,7 +1211,13 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, } } - if (!specific_ctx && !capable(CAP_SYS_ADMIN)) { + /* Similar to perf's kernel.perf_paranoid_cpu sysctl option +* we check a dev.i915.perf_stream_paranoid sysctl option +* to determine if it's ok to access system wide OA counters +* without CAP_SYS_ADMIN privileges. +*/ + if (!specific_ctx && + i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) { DRM_ERROR("Insufficient privileges to open system-wide i915 perf stream\n"); ret = -EACCES; goto err_ctx; @@ -1450,6 +1461,39 @@ void i915_perf_unregister(struct drm_i915_private *dev_priv) dev_priv->perf.metrics_kobj = NULL; } +static struct ctl_table oa_table[] = { + { +.procname = "perf_stream_paranoid", +.data = &i915_perf_stream_paranoid, +.maxlen = sizeof(i915_perf_stream_paranoid), +.mode = 0644, +.proc_handler = proc_dointvec_minmax, +.extra1 = &zero, +.extra2 = &one, +}, + {} +}; + +static struct ctl_table i915_root[] = { + { +.procname = "i915", +.maxlen = 0, +.mode = 0555, +.child = oa_table, +}, + {} +}; + +static struct ctl_table dev_root[] = { + { +.procname = "dev", +.maxlen = 0, +.mode = 0555, +.child = i915_root, +}, + {} +}; + void i915_perf_init(struct drm_i915_private *dev_priv) { if (!IS_HASWELL(dev_priv)) @@ -1482,6 +1526,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv) dev_priv->perf.oa.n_builtin_sets = i915_oa_n_builtin_metric_sets_hsw; + dev_priv->perf.sysctl_header = register_sysctl_table(dev_root); + dev_priv->perf.initialized = true; } @@ -1490,6 +1536,8 @@ void i915_perf_fini(struct drm_i915_private *dev_priv) if (!dev_priv->perf.initialized) return; + unregister_sysctl_table(dev_priv->perf.sysctl_header); + memset(&dev_priv->perf.oa.ops, 0, sizeof(dev_priv->perf.oa.ops)); dev_priv->perf.initialized = false; } -- 2.10.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v6 11/11] drm/i915: Add a kerneldoc summary for i915_perf.c
In particular this tries to capture for posterity some of the early challenges we had with using the core perf infrastructure in case we ever want to revisit adapting perf for device metrics. Cc: Chris Wilson Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_perf.c | 163 +++ 1 file changed, 163 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 4e985dd..1e29655 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -24,6 +24,169 @@ * Robert Bragg */ + +/** + * DOC: i915 Perf, streaming API for GPU metrics + * + * Gen graphics supports a large number of performance counters that can help + * driver and application developers understand and optimize their use of the + * GPU. + * + * This i915 perf interface enables userspace to configure and open a file + * descriptor representing a stream of GPU metrics which can then be read() as + * a stream of sample records. + * + * The interface is particularly suited to exposing buffered metrics that are + * captured by DMA from the GPU, unsynchronized with and unrelated to the CPU. + * + * Streams representing a single context are accessible to applications with a + * corresponding drm file descriptor, such that OpenGL can use the interface + * without special privileges. Access to system-wide metrics requires root + * privileges by default, unless changed via the dev.i915.perf_event_paranoid + * sysctl option. + * + * + * The interface was initially inspired by the core Perf infrastructure but + * some notable differences are: + * + * i915 perf file descriptors represent a "stream" instead of an "event"; where + * a perf event primarily corresponds to a single 64bit value, while a stream + * might sample sets of tightly-coupled counters, depending on the + * configuration. For example the Gen OA unit isn't designed to support + * orthogonal configurations of individual counters; it's configured for a set + * of related counters. Samples for an i915 perf stream capturing OA metrics + * will include a set of counter values packed in a compact HW specific format. + * The OA unit supports a number of different packing formats which can be + * selected by the user opening the stream. Perf has support for grouping + * events, but each event in the group is configured, validated and + * authenticated individually with separate system calls. + * + * i915 perf stream configurations are provided as an array of u64 (key,value) + * pairs, instead of a fixed struct with multiple miscellaneous config members, + * interleaved with event-type specific members. + * + * i915 perf doesn't support exposing metrics via an mmap'd circular buffer. + * The supported metrics are being written to memory by the GPU unsynchronized + * with the CPU, using HW specific packing formats for counter sets. Sometimes + * the constraints on HW configuration require reports to be filtered before it + * would be acceptable to expose them to unprivileged applications - to hide + * the metrics of other processes/contexts. For these use cases a read() based + * interface is a good fit, and provides an opportunity to filter data as it + * gets copied from the GPU mapped buffers to userspace buffers. + * + * + * Some notes regarding Linux Perf: + * + * + * The first prototype of this driver was based on the core perf + * infrastructure, and while we did make that mostly work, with some changes to + * perf, we found we were breaking or working around too many assumptions baked + * into perf's currently cpu centric design. + * + * In the end we didn't see a clear benefit to making perf's implementation and + * interface more complex by changing design assumptions while we knew we still + * wouldn't be able to use any existing perf based userspace tools. + * + * Also considering the Gen specific nature of the Observability hardware and + * how userspace will sometimes need to combine i915 perf OA metrics with + * side-band OA data captured via MI_REPORT_PERF_COUNT commands; we're + * expecting the interface to be used by a platform specific userspace such as + * OpenGL or tools. This is to say; we aren't inherently missing out on having + * a standard vendor/architecture agnostic interface by not using perf. + * + * + * For posterity, in case we might re-visit trying to adapt core perf to be + * better suited to exposing i915 metrics these were the main pain points we + * hit: + * + * - The perf based OA PMU driver broke some significant design assumptions: + * + * Existing perf pmus are used for profiling work on a cpu and we were + * introducing the idea of _IS_DEVICE pmus with different security + * implications, the need to fake cpu-related data (such as user/kernel + * registers) to fit with perf's current design, and addi
Re: [Intel-gfx] [PATCH v6 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit
On Thu, Oct 20, 2016 at 11:10 PM, Chris Wilson wrote: > On Thu, Oct 20, 2016 at 10:19:05PM +0100, Robert Bragg wrote: > > +int i915_gem_context_pin_legacy_rcs_state(struct drm_i915_private > *dev_priv, > > + struct i915_gem_context *ctx, > > + u64 flags) > > This is still no. > Okay, but it's a little frustrating for me to go in circles here :-/ I didn't originally do it this way; I originally looked at pinning the context when opening the stream so I didn't have to consider it being relocated. The feedback from Daniel Vetter was to look at doing it this way I think because of some concern to do with some shrinker corner cases. ... just dug up the archive: https://lists.freedesktop.org/archives/intel-gfx/2014-November/055385.html Can you maybe please explain what's wrong with the current approach and provide some justification for a different approach with some reassurance that Daniel's original concern with the shrinker unpinning contexts isn't actually a problem? I don't currently understand the concern with this, and this approach seems to have been working well for quite a long time now. - Robert ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v6 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit
On Thu, Oct 20, 2016 at 11:10 PM, Chris Wilson wrote: > On Thu, Oct 20, 2016 at 10:19:05PM +0100, Robert Bragg wrote: > > +int i915_gem_context_pin_legacy_rcs_state(struct drm_i915_private > *dev_priv, > > + struct i915_gem_context *ctx, > > + u64 flags) > > This is still no. > > > +static int alloc_oa_buffer(struct drm_i915_private *dev_priv) > > +{ > > + struct drm_i915_gem_object *bo; > > + enum i915_map_type map; > > + struct i915_vma *vma; > > + int ret; > > + > > + BUG_ON(dev_priv->perf.oa.oa_buffer.obj); > > + > > + ret = i915_mutex_lock_interruptible(&dev_priv->drm); > > + if (ret) > > + return ret; > > + > > + BUILD_BUG_ON_NOT_POWER_OF_2(OA_BUFFER_SIZE); > > + BUILD_BUG_ON(OA_BUFFER_SIZE < SZ_128K || OA_BUFFER_SIZE > SZ_16M); > > + > > + bo = i915_gem_object_create(&dev_priv->drm, OA_BUFFER_SIZE); > > + if (IS_ERR(bo)) { > > + DRM_ERROR("Failed to allocate OA buffer\n"); > > + ret = PTR_ERR(bo); > > + goto unlock; > > + } > > + dev_priv->perf.oa.oa_buffer.obj = bo; > > + > > + ret = i915_gem_object_set_cache_level(bo, I915_CACHE_LLC); > > + if (ret) > > + goto err_unref; > > + > > + /* PreHSW required 512K alignment, HSW requires 16M */ > > + vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, PIN_MAPPABLE); > > + if (IS_ERR(vma)) { > > + ret = PTR_ERR(vma); > > + goto err_unref; > > + } > > + dev_priv->perf.oa.oa_buffer.vma = vma; > > + > > + map = HAS_LLC(dev_priv) ? I915_MAP_WB : I915_MAP_WC; > > You set the hw up to do coherent writes into the CPU cache, and then you > request WC access to the pages? With set_cache_level(LLC) you can use > MAP_WB on both llc and snoop based architectures. Fortunately this is > only HSW! > hmm, yeah it looks like I unwittingly added this recently as part of a rebase, I think from lazily copying some similar code from intel_ringbuffer.c when I hit a conflict, without thinking more carefully, sorry. > > > + dev_priv->perf.oa.oa_buffer.gtt_offset = i915_ggtt_offset(vma); > > I haven't spotted the advantage of storing both the ggtt_offset in > addition to the vma (or the bo as well as the vma). > right, it looks like this can be cleaned up. > > > + dev_priv->perf.oa.oa_buffer.addr = i915_gem_object_pin_map(bo, > map); > > + if (IS_ERR(dev_priv->perf.oa.oa_buffer.addr)) { > > + ret = PTR_ERR(dev_priv->perf.oa.oa_buffer.addr); > > + goto err_unpin; > > + } > > -- > Chris Wilson, Intel Open Source Technology Centre > Thanks, - Robert ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: Enable i915 perf stream for Haswell OA unit
Gen graphics hardware can be set up to periodically write snapshots of performance counters into a circular buffer via its Observation Architecture and this patch exposes that capability to userspace via the i915 perf interface. v2: Make sure to initialize ->specific_ctx_id when opening, without relying on _pin_notify hook, in case ctx already pinned. Cc: Chris Wilson Signed-off-by: Robert Bragg Signed-off-by: Zhenyu Wang --- drivers/gpu/drm/i915/i915_drv.h | 70 ++- drivers/gpu/drm/i915/i915_gem_context.c | 22 +- drivers/gpu/drm/i915/i915_perf.c| 1028 ++- drivers/gpu/drm/i915/i915_reg.h | 338 ++ drivers/gpu/drm/i915/intel_ringbuffer.c | 11 +- include/uapi/drm/i915_drm.h | 70 ++- 6 files changed, 1507 insertions(+), 32 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 28f3f77..b155ab0 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1760,6 +1760,11 @@ struct intel_wm_config { bool sprites_scaled; }; +struct i915_oa_format { + u32 format; + int size; +}; + struct i915_oa_reg { i915_reg_t addr; u32 value; @@ -1780,11 +1785,6 @@ struct i915_perf_stream_ops { */ void (*disable)(struct i915_perf_stream *stream); - /* Return: true if any i915 perf records are ready to read() -* for this stream. -*/ - bool (*can_read)(struct i915_perf_stream *stream); - /* Call poll_wait, passing a wait queue that will be woken * once there is something ready to read() for the stream */ @@ -1794,9 +1794,7 @@ struct i915_perf_stream_ops { /* For handling a blocking read, wait until there is something * to ready to read() for the stream. E.g. wait on the same -* wait queue that would be passed to poll_wait() until -* ->can_read() returns true (if its safe to call ->can_read() -* without the i915 perf lock held). +* wait queue that would be passed to poll_wait(). */ int (*wait_unlocked)(struct i915_perf_stream *stream); @@ -1836,11 +1834,28 @@ struct i915_perf_stream { struct list_head link; u32 sample_flags; + int sample_size; struct i915_gem_context *ctx; bool enabled; - struct i915_perf_stream_ops *ops; + const struct i915_perf_stream_ops *ops; +}; + +struct i915_oa_ops { + void (*init_oa_buffer)(struct drm_i915_private *dev_priv); + int (*enable_metric_set)(struct drm_i915_private *dev_priv); + void (*disable_metric_set)(struct drm_i915_private *dev_priv); + void (*oa_enable)(struct drm_i915_private *dev_priv); + void (*oa_disable)(struct drm_i915_private *dev_priv); + void (*update_oacontrol)(struct drm_i915_private *dev_priv); + void (*update_hw_ctx_id_locked)(struct drm_i915_private *dev_priv, + u32 ctx_id); + int (*read)(struct i915_perf_stream *stream, + char __user *buf, + size_t count, + size_t *offset); + bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv); }; struct drm_i915_private { @@ -2145,16 +2160,46 @@ struct drm_i915_private { struct { bool initialized; + struct mutex lock; struct list_head streams; + spinlock_t hook_lock; + struct { - u32 metrics_set; + struct i915_perf_stream *exclusive_stream; + + u32 specific_ctx_id; + + struct hrtimer poll_check_timer; + wait_queue_head_t poll_wq; + atomic_t pollin; + + bool periodic; + int period_exponent; + int timestamp_frequency; + + int tail_margin; + + int metrics_set; const struct i915_oa_reg *mux_regs; int mux_regs_len; const struct i915_oa_reg *b_counter_regs; int b_counter_regs_len; + + struct { + struct i915_vma *vma; + u8 *vaddr; + int format; + int format_size; + } oa_buffer; + + u32 gen7_latched_oastatus1; + + struct i915_oa_ops ops; + const struct i915_oa_format *oa_formats; + int n_builtin_sets; } oa; } perf; @@ -3525,6 +3570,9 @@ struct drm_i915_gem_object * i915_gem_alloc_context_obj(struct drm_device *dev, size_t size); struct i915_gem_context * i915_gem_context_create_g
[Intel-gfx] [PATCH] igt/gem_exec_parse: update for version 8 changes
Just wanted to show how I'd been looking at updating gem_exec_parse considering my interface change to stop returning EINVAL to userspace. I think maybe it'd be good to split the patch up since I moved things around a bit, but hopefully it's not too bad to skim for now. I wasn't quite sure what to make of the recentl stray_lri change, since it seems to accept returning EINVAL to userspace which for OACONTROL specifically is a condition we ultimately want to be sure we don't allow because it's liable to cause Mesa applications to abort. The reason stray_lri should currently see an acceptable EINVAL is because OACONTROL is is a very special case register for the command parser and OACONTROL needs to be disabled before the end of a batch, but that's checked for in oacontrol-tracking. Something else I was unsure of with the stray_lri test is that it's using 0xdeadbeef as a debug value, which seems a bit risky for OACONTROL as it will enable the OA unit with periodic sampling. --- >8 --- This adapts the tests to account for the parser no longer reporting privilege violations back to userspace as EINVAL errors (they are left to the HW command parser to squash the commands to NOOPS). The interface change isn't expected to affect userspace and in fact it looks like the previous behaviour was liable to break userspace, such as Mesa which explicitly tries to observe whether OACONTROL LRIs are squashed to NOOPs but Mesa will abort for execbuffer errors. Signed-off-by: Robert Bragg --- tests/gem_exec_parse.c | 368 +++-- 1 file changed, 200 insertions(+), 168 deletions(-) diff --git a/tests/gem_exec_parse.c b/tests/gem_exec_parse.c index 36bf57d..2bccecd 100644 --- a/tests/gem_exec_parse.c +++ b/tests/gem_exec_parse.c @@ -34,7 +34,24 @@ #define I915_PARAM_CMD_PARSER_VERSION 28 #endif +#define ARRAY_LEN(A) (sizeof(A) / sizeof(A[0])) + +#define OACONTROL 0x2360 #define DERRMR 0x44050 +#define SO_WRITE_OFFSET_0 0x5280 +#define HSW_CS_GPR(n) (0x2600 + 8*(n)) +#define HSW_CS_GPR0 HSW_CS_GPR(0) +#define HSW_CS_GPR1 HSW_CS_GPR(1) + +#define MI_LOAD_REGISTER_REG (0x2a << 23) +#define MI_STORE_REGISTER_MEM (0x24 << 23) +#define MI_ARB_ON_OFF (0x8 << 23) +#define MI_DISPLAY_FLIP ((0x14 << 23) | 1) + +#define GFX_OP_PIPE_CONTROL((0x3<<29)|(0x3<<27)|(0x2<<24)|2) +#define PIPE_CONTROL_QW_WRITE(1<<14) +#define PIPE_CONTROL_LRI_POST_OP (1<<23) + static int command_parser_version(int fd) { @@ -50,101 +67,8 @@ static int command_parser_version(int fd) return -1; } -#define HSW_CS_GPR(n) (0x2600 + 8*(n)) -#define HSW_CS_GPR0 HSW_CS_GPR(0) -#define HSW_CS_GPR1 HSW_CS_GPR(1) - -#define MI_LOAD_REGISTER_REG (0x2a << 23) -#define MI_STORE_REGISTER_MEM (0x24 << 23) -static void hsw_load_register_reg(void) -{ - uint32_t buf[16] = { - MI_LOAD_REGISTER_IMM | (5 - 2), - HSW_CS_GPR0, - 0xabcdabcd, - HSW_CS_GPR1, - 0xdeadbeef, - - MI_STORE_REGISTER_MEM | (3 - 2), - HSW_CS_GPR1, - 0, /* address0 */ - - MI_LOAD_REGISTER_REG | (3 - 2), - HSW_CS_GPR0, - HSW_CS_GPR1, - - MI_STORE_REGISTER_MEM | (3 - 2), - HSW_CS_GPR1, - 4, /* address1 */ - - MI_BATCH_BUFFER_END, - }; - struct drm_i915_gem_execbuffer2 execbuf; - struct drm_i915_gem_exec_object2 obj[2]; - struct drm_i915_gem_relocation_entry reloc[2]; - int fd; - - /* Open again to get a non-master file descriptor */ - fd = drm_open_driver(DRIVER_INTEL); - - igt_require(IS_HASWELL(intel_get_drm_devid(fd))); - igt_require(command_parser_version(fd) >= 7); - - memset(obj, 0, sizeof(obj)); - obj[0].handle = gem_create(fd, 4096); - obj[1].handle = gem_create(fd, 4096); - gem_write(fd, obj[1].handle, 0, buf, sizeof(buf)); - - memset(reloc, 0, sizeof(reloc)); - reloc[0].offset = 7*sizeof(uint32_t); - reloc[0].target_handle = obj[0].handle; - reloc[0].delta = 0; - reloc[0].read_domains = I915_GEM_DOMAIN_INSTRUCTION; - reloc[0].write_domain = I915_GEM_DOMAIN_INSTRUCTION; - reloc[1].offset = 13*sizeof(uint32_t); - reloc[1].target_handle = obj[0].handle; - reloc[1].delta = sizeof(uint32_t); - reloc[1].read_domains = I915_GEM_DOMAIN_INSTRUCTION; - reloc[1].write_domain = I915_GEM_DOMAIN_INSTRUCTION; - obj[1].relocs_ptr = (uintptr_t)&reloc; - obj[1].relocation_count = 2; - - memset(&execbuf, 0, sizeof(execbuf)); - execbuf.buffers_ptr = (uintptr_t)obj; - execbuf.buffer_count = 2; - execbuf.batch_len = sizeof(buf); - execbuf.flags = I915_EXEC_RENDER; - gem_execbuf(fd, &execbuf); -
[Intel-gfx] [PATCH v7 03/11] drm/i915: return EACCES for check_cmd() failures
check_cmd() is checking whether a command adheres to certain restrictions that ensure it's safe to execute within a privileged batch buffer. Returning false implies a privilege problem, not that the command is invalid. The distinction makes the difference between allowing the buffer to be executed as an unprivileged batch buffer or returning an EINVAL error to userspace without executing anything. In a case where userspace may want to test whether it can successfully write to a register that needs privileges the distinction may be important and an EINVAL error may be considered fatal. In particular this is currently true for Mesa, which includes a test for whether OACONTROL can be written too, but Mesa treats any error when flushing a batch buffer as fatal, calling exit(1). As it is currently Mesa can gracefully handle a failure to write to OACONTROL if the command parser is disabled, but if we were to remove OACONTROL from the parser's whitelist then the returned EINVAL would break Mesa applications as they attempt an OACONTROL write. This bumps the command parser version from 7 to 8, as the change is visible to userspace. Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_cmd_parser.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index fe34470..c45dd83 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -1272,7 +1272,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, if (!check_cmd(engine, desc, cmd, length, is_master, &oacontrol_set)) { - ret = -EINVAL; + ret = -EACCES; break; } @@ -1333,6 +1333,9 @@ int i915_cmd_parser_get_version(struct drm_i915_private *dev_priv) * 5. GPGPU dispatch compute indirect registers. * 6. TIMESTAMP register and Haswell CS GPR registers * 7. Allow MI_LOAD_REGISTER_REG between whitelisted registers. +* 8. Don't report cmd_check() failures as EINVAL errors to userspace; +*rely on the HW to NOOP disallowed commands as it would without +*the parser enabled. */ - return 7; + return 8; } -- 2.10.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v7 04/11] drm/i915: don't whitelist oacontrol in cmd parser
Being able to program OACONTROL from a non-privileged batch buffer is not sufficient to be able to configure the OA unit. This was originally allowed to help enable Mesa to expose OA counters via the INTEL_performance_query extension, but the current implementation based on programming OACONTROL via a batch buffer isn't able to report useable data without a more complete OA unit configuration. Mesa handles the possibility that writes to OACONTROL may not be allowed and so only advertises the extension after explicitly testing that a write to OACONTROL succeeds. Based on this; removing OACONTROL from the whitelist should be ok for userspace. Removing this simplifies adding a new kernel api for configuring the OA unit without needing to consider the possibility that userspace might trample on OACONTROL state which we'd like to start managing within the kernel instead. In particular running any Mesa based GL application currently results in clearing OACONTROL when initializing which would disable the capturing of metrics. Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/i915_cmd_parser.c | 38 ++ 1 file changed, 2 insertions(+), 36 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index c45dd83..5152d6f 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -450,7 +450,6 @@ static const struct drm_i915_reg_descriptor gen7_render_regs[] = { REG64(PS_INVOCATION_COUNT), REG64(PS_DEPTH_COUNT), REG64_IDX(RING_TIMESTAMP, RENDER_RING_BASE), - REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below. */ REG64(MI_PREDICATE_SRC0), REG64(MI_PREDICATE_SRC1), REG32(GEN7_3DPRIM_END_OFFSET), @@ -1060,8 +1059,7 @@ bool intel_engine_needs_cmd_parser(struct intel_engine_cs *engine) static bool check_cmd(const struct intel_engine_cs *engine, const struct drm_i915_cmd_descriptor *desc, const u32 *cmd, u32 length, - const bool is_master, - bool *oacontrol_set) + const bool is_master) { if (desc->flags & CMD_DESC_SKIP) return true; @@ -1099,31 +1097,6 @@ static bool check_cmd(const struct intel_engine_cs *engine, } /* -* OACONTROL requires some special handling for -* writes. We want to make sure that any batch which -* enables OA also disables it before the end of the -* batch. The goal is to prevent one process from -* snooping on the perf data from another process. To do -* that, we need to check the value that will be written -* to the register. Hence, limit OACONTROL writes to -* only MI_LOAD_REGISTER_IMM commands. -*/ - if (reg_addr == i915_mmio_reg_offset(GEN7_OACONTROL)) { - if (desc->cmd.value == MI_LOAD_REGISTER_MEM) { - DRM_DEBUG_DRIVER("CMD: Rejected LRM to OACONTROL\n"); - return false; - } - - if (desc->cmd.value == MI_LOAD_REGISTER_REG) { - DRM_DEBUG_DRIVER("CMD: Rejected LRR to OACONTROL\n"); - return false; - } - - if (desc->cmd.value == MI_LOAD_REGISTER_IMM(1)) - *oacontrol_set = (cmd[offset + 1] != 0); - } - - /* * Check the value written to the register against the * allowed mask/value pair given in the whitelist entry. */ @@ -1214,7 +1187,6 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, u32 *cmd, *batch_end; struct drm_i915_cmd_descriptor default_desc = noop_desc; const struct drm_i915_cmd_descriptor *desc = &default_desc; - bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */ bool needs_clflush_after = false; int ret = 0; @@ -1270,8 +1242,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, break; } - if (!check_cmd(engine, desc, cmd, length, is_master, - &oacontrol_set)) { + if (!check_cmd(engine, desc, cmd, length, is_master)) { ret = -EACCES; break; } @@ -1279,11 +1250,6 @@ int intel_engine_cmd_parser(struct i
[Intel-gfx] [PATCH v7 07/11] drm/i915: advertise available metrics via sysfs
Each metric set is given a sysfs entry like: /sys/class/drm/card0/metrics//id This allows userspace to enumerate the specific sets that are available for the current system. The 'id' file contains an unsigned integer that can be used to open the associated metric set via DRM_IOCTL_I915_PERF_OPEN. The is a globally unique ID for a specific OA unit register configuration that can be reliably used by userspace as a key to lookup corresponding counter meta data and normalization equations. The guid registry is currently maintained as part of gputop along with the XML metric set descriptions and code generation scripts, ref: https://github.com/rib/gputop > gputop-data/guids.xml > scripts/update-guids.py > gputop-data/oa-*.xml > scripts/i915-perf-kernelgen.py $ make -C gputop-data -f Makefile.xml SYSFS=1 WHITELIST=RenderBasic Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_drv.c| 5 drivers/gpu/drm/i915/i915_drv.h| 4 +++ drivers/gpu/drm/i915/i915_oa_hsw.c | 51 + drivers/gpu/drm/i915/i915_oa_hsw.h | 4 +++ drivers/gpu/drm/i915/i915_perf.c | 52 ++ 5 files changed, 116 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index e99d14e..b887051 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -1115,6 +1115,9 @@ static void i915_driver_register(struct drm_i915_private *dev_priv) if (drm_dev_register(dev, 0) == 0) { i915_debugfs_register(dev_priv); i915_setup_sysfs(dev_priv); + + /* Depends on sysfs having been initialized */ + i915_perf_register(dev_priv); } else DRM_ERROR("Failed to register driver for userspace access!\n"); @@ -1151,6 +1154,8 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv) acpi_video_unregister(); intel_opregion_unregister(dev_priv); + i915_perf_unregister(dev_priv); + i915_teardown_sysfs(dev_priv); i915_debugfs_unregister(dev_priv); drm_dev_unregister(&dev_priv->drm); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index ea24814..a968212 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2165,6 +2165,8 @@ struct drm_i915_private { struct { bool initialized; + struct kobject *metrics_kobj; + struct mutex lock; struct list_head streams; @@ -3748,6 +3750,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, /* i915_perf.c */ extern void i915_perf_init(struct drm_i915_private *dev_priv); extern void i915_perf_fini(struct drm_i915_private *dev_priv); +extern void i915_perf_register(struct drm_i915_private *dev_priv); +extern void i915_perf_unregister(struct drm_i915_private *dev_priv); /* i915_suspend.c */ extern int i915_save_state(struct drm_device *dev); diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c b/drivers/gpu/drm/i915/i915_oa_hsw.c index 8906380..19f272b 100644 --- a/drivers/gpu/drm/i915/i915_oa_hsw.c +++ b/drivers/gpu/drm/i915/i915_oa_hsw.c @@ -24,6 +24,8 @@ * */ +#include + #include "i915_drv.h" #include "i915_oa_hsw.h" @@ -142,3 +144,52 @@ int i915_oa_select_metric_set_hsw(struct drm_i915_private *dev_priv) return -ENODEV; } } + +static ssize_t +show_render_basic_id(struct device *kdev, struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%d\n", METRIC_SET_ID_RENDER_BASIC); +} + +static struct device_attribute dev_attr_render_basic_id = { + .attr = { .name = "id", .mode = S_IRUGO }, + .show = show_render_basic_id, + .store = NULL, +}; + +static struct attribute *attrs_render_basic[] = { + &dev_attr_render_basic_id.attr, + NULL, +}; + +static struct attribute_group group_render_basic = { + .name = "403d8832-1a27-4aa6-a64e-f5389ce7b212", + .attrs = attrs_render_basic, +}; + +int +i915_perf_register_sysfs_hsw(struct drm_i915_private *dev_priv) +{ + int mux_len; + int ret = 0; + + if (get_render_basic_mux_config(dev_priv, &mux_len)) { + ret = sysfs_create_group(dev_priv->perf.metrics_kobj, &group_render_basic); + if (ret) + goto error_render_basic; + } + + return 0; + +error_render_basic: + return ret; +} + +void +i915_perf_unregister_sysfs_hsw(struct drm_i915_private *dev_priv) +{ + int mux_len; + + if (get_render_basic_mux_config(dev_priv, &mux_len)) + sysfs_remove_group(dev_priv->perf.metrics_kobj, &group_render_basic); +} diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.h b/drivers/gpu/drm/i915/i915_oa_hsw.h index b618a
[Intel-gfx] [PATCH v7 02/11] drm/i915: rename OACONTROL GEN7_OACONTROL
OACONTROL changes quite a bit for gen8, with some bits split out into a per-context OACTXCONTROL register. Rename now before adding more gen7 OA registers Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/gvt/handlers.c| 2 +- drivers/gpu/drm/i915/i915_cmd_parser.c | 4 ++-- drivers/gpu/drm/i915/i915_reg.h| 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c index 3e74fb3..68e07a1 100644 --- a/drivers/gpu/drm/i915/gvt/handlers.c +++ b/drivers/gpu/drm/i915/gvt/handlers.c @@ -2159,7 +2159,7 @@ static int init_generic_mmio_info(struct intel_gvt *gvt) MMIO_DFH(0x1217c, D_ALL, F_CMD_ACCESS, NULL, NULL); MMIO_F(0x2290, 8, 0, 0, 0, D_HSW_PLUS, NULL, NULL); - MMIO_D(OACONTROL, D_HSW); + MMIO_D(GEN7_OACONTROL, D_HSW); MMIO_D(0x2b00, D_BDW_PLUS); MMIO_D(0x2360, D_BDW_PLUS); MMIO_F(0x5200, 32, 0, 0, 0, D_ALL, NULL, NULL); diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index f191d7b..fe34470 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -450,7 +450,7 @@ static const struct drm_i915_reg_descriptor gen7_render_regs[] = { REG64(PS_INVOCATION_COUNT), REG64(PS_DEPTH_COUNT), REG64_IDX(RING_TIMESTAMP, RENDER_RING_BASE), - REG32(OACONTROL), /* Only allowed for LRI and SRM. See below. */ + REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below. */ REG64(MI_PREDICATE_SRC0), REG64(MI_PREDICATE_SRC1), REG32(GEN7_3DPRIM_END_OFFSET), @@ -1108,7 +1108,7 @@ static bool check_cmd(const struct intel_engine_cs *engine, * to the register. Hence, limit OACONTROL writes to * only MI_LOAD_REGISTER_IMM commands. */ - if (reg_addr == i915_mmio_reg_offset(OACONTROL)) { + if (reg_addr == i915_mmio_reg_offset(GEN7_OACONTROL)) { if (desc->cmd.value == MI_LOAD_REGISTER_MEM) { DRM_DEBUG_DRIVER("CMD: Rejected LRM to OACONTROL\n"); return false; diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index a9be3f0..070d3297 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -615,7 +615,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define HSW_CS_GPR(n) _MMIO(0x2600 + (n) * 8) #define HSW_CS_GPR_UDW(n) _MMIO(0x2600 + (n) * 8 + 4) -#define OACONTROL _MMIO(0x2360) +#define GEN7_OACONTROL _MMIO(0x2360) #define _GEN7_PIPEA_DE_LOAD_SL 0x70068 #define _GEN7_PIPEB_DE_LOAD_SL 0x71068 -- 2.10.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v7 01/11] drm/i915: Add i915 perf infrastructure
Adds base i915 perf infrastructure for Gen performance metrics. This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64 properties to configure a stream of metrics and returns a new fd usable with standard VFS system calls including read() to read typed and sized records; ioctl() to enable or disable capture and poll() to wait for data. A stream is opened something like: uint64_t properties[] = { /* Single context sampling */ DRM_I915_PERF_PROP_CTX_HANDLE,ctx_handle, /* Include OA reports in samples */ DRM_I915_PERF_PROP_SAMPLE_OA, true, /* OA unit configuration */ DRM_I915_PERF_PROP_OA_METRICS_SET,metrics_set_id, DRM_I915_PERF_PROP_OA_FORMAT, report_format, DRM_I915_PERF_PROP_OA_EXPONENT, period_exponent, }; struct drm_i915_perf_open_param parm = { .flags = I915_PERF_FLAG_FD_CLOEXEC | I915_PERF_FLAG_FD_NONBLOCK | I915_PERF_FLAG_DISABLED, .properties_ptr = (uint64_t)properties, .num_properties = sizeof(properties) / 16, }; int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, ¶m); Records read all start with a common { type, size } header with DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records contain an extensible number of fields and it's the DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that determine what's included in every sample. No specific streams are supported yet so any attempt to open a stream will return an error. v4: s/DRM_IORW/DRM_IOW/ - Emil Velikov v3: update read() interface to avoid passing state struct - Chris Wilson fix some rebase fallout, with i915-perf init/deinit v2: use i915_gem_context_get() - Chris Wilson Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/Makefile| 3 + drivers/gpu/drm/i915/i915_drv.c | 4 + drivers/gpu/drm/i915/i915_drv.h | 91 drivers/gpu/drm/i915/i915_perf.c | 443 +++ include/uapi/drm/i915_drm.h | 67 ++ 5 files changed, 608 insertions(+) create mode 100644 drivers/gpu/drm/i915/i915_perf.c diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 6123400..8d4e25f 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -113,6 +113,9 @@ i915-$(CONFIG_DRM_I915_CAPTURE_ERROR) += i915_gpu_error.o # virtual gpu code i915-y += i915_vgpu.o +# perf code +i915-y += i915_perf.o + ifeq ($(CONFIG_DRM_I915_GVT),y) i915-y += intel_gvt.o include $(src)/gvt/Makefile diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 99e4e04..e99d14e 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -836,6 +836,8 @@ static int i915_driver_init_early(struct drm_i915_private *dev_priv, intel_detect_preproduction_hw(dev_priv); + i915_perf_init(dev_priv); + return 0; err_workqueues: @@ -849,6 +851,7 @@ static int i915_driver_init_early(struct drm_i915_private *dev_priv, */ static void i915_driver_cleanup_early(struct drm_i915_private *dev_priv) { + i915_perf_fini(dev_priv); i915_gem_load_cleanup(&dev_priv->drm); i915_workqueues_cleanup(dev_priv); } @@ -2554,6 +2557,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = { DRM_IOCTL_DEF_DRV(I915_GEM_USERPTR, i915_gem_userptr_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM, i915_gem_context_getparam_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_SETPARAM, i915_gem_context_setparam_ioctl, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(I915_PERF_OPEN, i915_perf_open_ioctl, DRM_RENDER_ALLOW), }; static struct drm_driver driver = { diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index dd3acab..fcc5958 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1764,6 +1764,84 @@ struct intel_wm_config { bool sprites_scaled; }; +struct i915_perf_stream; + +struct i915_perf_stream_ops { + /* Enables the collection of HW samples, either in response to +* I915_PERF_IOCTL_ENABLE or implicitly called when stream is +* opened without I915_PERF_FLAG_DISABLED. +*/ + void (*enable)(struct i915_perf_stream *stream); + + /* Disables the collection of HW samples, either in response to +* I915_PERF_IOCTL_DISABLE or implicitly called before +* destroying the stream. +*/ + void (*disable)(struct i915_perf_stream *stream); + + /* Return: true if any i915 perf records are ready to read() +* for this stream. +*/ + bool (*can_read)(struct i915_perf_stream *stream); + + /* Call poll_wait, passing a wait queue that will be woken +* once there is something ready to read() for the stream +*/ + void (*poll_wait)(struct i915_pe
[Intel-gfx] [PATCH v7 08/11] drm/i915: Add dev.i915.perf_stream_paranoid sysctl option
Consistent with the kernel.perf_event_paranoid sysctl option that can allow non-root users to access system wide cpu metrics, this can optionally allow non-root users to access system wide OA counter metrics from Gen graphics hardware. Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_perf.c | 50 +++- 2 files changed, 50 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index a968212..7010c6e 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2166,6 +2166,7 @@ struct drm_i915_private { bool initialized; struct kobject *metrics_kobj; + struct ctl_table_header *sysctl_header; struct mutex lock; struct list_head streams; diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index aedefbc..ab4c171 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -64,6 +64,11 @@ #define POLL_FREQUENCY 200 #define POLL_PERIOD (NSEC_PER_SEC / POLL_FREQUENCY) +/* for sysctl proc_dointvec_minmax of dev.i915.perf_stream_paranoid */ +static int zero; +static int one = 1; +static u32 i915_perf_stream_paranoid = true; + /* The maximum exponent the hardware accepts is 63 (essentially it selects one * of the 64bit timestamp bits to trigger reports from) but there's currently * no known use case for sampling as infrequently as once per 47 thousand years. @@ -1174,7 +1179,13 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, } } - if (!specific_ctx && !capable(CAP_SYS_ADMIN)) { + /* Similar to perf's kernel.perf_paranoid_cpu sysctl option +* we check a dev.i915.perf_stream_paranoid sysctl option +* to determine if it's ok to access system wide OA counters +* without CAP_SYS_ADMIN privileges. +*/ + if (!specific_ctx && + i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) { DRM_ERROR("Insufficient privileges to open system-wide i915 perf stream\n"); ret = -EACCES; goto err_ctx; @@ -1418,6 +1429,39 @@ void i915_perf_unregister(struct drm_i915_private *dev_priv) dev_priv->perf.metrics_kobj = NULL; } +static struct ctl_table oa_table[] = { + { +.procname = "perf_stream_paranoid", +.data = &i915_perf_stream_paranoid, +.maxlen = sizeof(i915_perf_stream_paranoid), +.mode = 0644, +.proc_handler = proc_dointvec_minmax, +.extra1 = &zero, +.extra2 = &one, +}, + {} +}; + +static struct ctl_table i915_root[] = { + { +.procname = "i915", +.maxlen = 0, +.mode = 0555, +.child = oa_table, +}, + {} +}; + +static struct ctl_table dev_root[] = { + { +.procname = "dev", +.maxlen = 0, +.mode = 0555, +.child = i915_root, +}, + {} +}; + void i915_perf_init(struct drm_i915_private *dev_priv) { if (!IS_HASWELL(dev_priv)) @@ -1448,6 +1492,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv) dev_priv->perf.oa.n_builtin_sets = i915_oa_n_builtin_metric_sets_hsw; + dev_priv->perf.sysctl_header = register_sysctl_table(dev_root); + dev_priv->perf.initialized = true; } @@ -1456,6 +1502,8 @@ void i915_perf_fini(struct drm_i915_private *dev_priv) if (!dev_priv->perf.initialized) return; + unregister_sysctl_table(dev_priv->perf.sysctl_header); + memset(&dev_priv->perf.oa.ops, 0, sizeof(dev_priv->perf.oa.ops)); dev_priv->perf.initialized = false; } -- 2.10.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v7 09/11] drm/i915: add oa_event_min_timer_exponent sysctl
The minimal sampling period is now configurable via a dev.i915.oa_min_timer_exponent sysctl parameter. Following the precedent set by perf, the default is the minimum that won't (on its own) exceed the default kernel.perf_event_max_sample_rate default of 10 samples/s. Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_perf.c | 41 1 file changed, 29 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index ab4c171..e46cd36 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -82,6 +82,22 @@ static u32 i915_perf_stream_paranoid = true; #define INVALID_CTX_ID 0x +/* for sysctl proc_dointvec_minmax of i915_oa_min_timer_exponent */ +static int oa_exponent_max = OA_EXPONENT_MAX; + +/* Theoretically we can program the OA unit to sample every 160ns but don't + * allow that by default unless root... + * + * The period is derived from the exponent as: + * + * period = 80ns * 2^(exponent + 1) + * + * Referring to perf's kernel.perf_event_max_sample_rate for a precedent + * (10 by default); with an OA exponent of 6 we get a period of 10.240 + * microseconds - just under 10Hz + */ +static u32 i915_oa_min_timer_exponent = 6; + /* XXX: beware if future OA HW adds new report formats that the current * code assumes all reports have a power-of-two size and ~(size - 1) can * be used as a mask to align the OA tail pointer. @@ -1317,21 +1333,13 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, return -EINVAL; } - /* NB: The exponent represents a period as follows: -* -* 80ns * 2^(period_exponent + 1) -* -* Theoretically we can program the OA unit to sample + /* Theoretically we can program the OA unit to sample * every 160ns but don't allow that by default unless * root. -* -* Referring to perf's -* kernel.perf_event_max_sample_rate for a precedent -* (10 by default); with an OA exponent of 6 we get -* a period of 10.240 microseconds -just under 10Hz */ - if (value < 6 && !capable(CAP_SYS_ADMIN)) { - DRM_ERROR("Sampling period too high without root privileges\n"); + if (value < i915_oa_min_timer_exponent && + !capable(CAP_SYS_ADMIN)) { + DRM_ERROR("OA timer exponent too low without root privileges\n"); return -EACCES; } @@ -1439,6 +1447,15 @@ static struct ctl_table oa_table[] = { .extra1 = &zero, .extra2 = &one, }, + { +.procname = "oa_min_timer_exponent", +.data = &i915_oa_min_timer_exponent, +.maxlen = sizeof(i915_oa_min_timer_exponent), +.mode = 0644, +.proc_handler = proc_dointvec_minmax, +.extra1 = &zero, +.extra2 = &oa_exponent_max, +}, {} }; -- 2.10.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v7 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit
Gen graphics hardware can be set up to periodically write snapshots of performance counters into a circular buffer via its Observation Architecture and this patch exposes that capability to userspace via the i915 perf interface. v2: Make sure to initialize ->specific_ctx_id when opening, without relying on _pin_notify hook, in case ctx already pinned. v3: Revert back to pinning ctx upfront when opening stream, removing need to hook in to pinning and to update OACONTROL on the fly. Cc: Chris Wilson Signed-off-by: Robert Bragg Signed-off-by: Zhenyu Wang fix enable hsw --- drivers/gpu/drm/i915/i915_drv.h | 65 ++- drivers/gpu/drm/i915/i915_perf.c | 1000 +- drivers/gpu/drm/i915/i915_reg.h | 338 + include/uapi/drm/i915_drm.h | 70 ++- 4 files changed, 1444 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 3448d05..ea24814 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1764,6 +1764,11 @@ struct intel_wm_config { bool sprites_scaled; }; +struct i915_oa_format { + u32 format; + int size; +}; + struct i915_oa_reg { i915_reg_t addr; u32 value; @@ -1784,11 +1789,6 @@ struct i915_perf_stream_ops { */ void (*disable)(struct i915_perf_stream *stream); - /* Return: true if any i915 perf records are ready to read() -* for this stream. -*/ - bool (*can_read)(struct i915_perf_stream *stream); - /* Call poll_wait, passing a wait queue that will be woken * once there is something ready to read() for the stream */ @@ -1798,9 +1798,7 @@ struct i915_perf_stream_ops { /* For handling a blocking read, wait until there is something * to ready to read() for the stream. E.g. wait on the same -* wait queue that would be passed to poll_wait() until -* ->can_read() returns true (if its safe to call ->can_read() -* without the i915 perf lock held). +* wait queue that would be passed to poll_wait(). */ int (*wait_unlocked)(struct i915_perf_stream *stream); @@ -1840,11 +1838,28 @@ struct i915_perf_stream { struct list_head link; u32 sample_flags; + int sample_size; struct i915_gem_context *ctx; bool enabled; - struct i915_perf_stream_ops *ops; + const struct i915_perf_stream_ops *ops; +}; + +struct i915_oa_ops { + void (*init_oa_buffer)(struct drm_i915_private *dev_priv); + int (*enable_metric_set)(struct drm_i915_private *dev_priv); + void (*disable_metric_set)(struct drm_i915_private *dev_priv); + void (*oa_enable)(struct drm_i915_private *dev_priv); + void (*oa_disable)(struct drm_i915_private *dev_priv); + void (*update_oacontrol)(struct drm_i915_private *dev_priv); + void (*update_hw_ctx_id_locked)(struct drm_i915_private *dev_priv, + u32 ctx_id); + int (*read)(struct i915_perf_stream *stream, + char __user *buf, + size_t count, + size_t *offset); + bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv); }; struct drm_i915_private { @@ -2149,16 +2164,46 @@ struct drm_i915_private { struct { bool initialized; + struct mutex lock; struct list_head streams; + spinlock_t hook_lock; + struct { - u32 metrics_set; + struct i915_perf_stream *exclusive_stream; + + u32 specific_ctx_id; + + struct hrtimer poll_check_timer; + wait_queue_head_t poll_wq; + atomic_t pollin; + + bool periodic; + int period_exponent; + int timestamp_frequency; + + int tail_margin; + + int metrics_set; const struct i915_oa_reg *mux_regs; int mux_regs_len; const struct i915_oa_reg *b_counter_regs; int b_counter_regs_len; + + struct { + struct i915_vma *vma; + u8 *vaddr; + int format; + int format_size; + } oa_buffer; + + u32 gen7_latched_oastatus1; + + struct i915_oa_ops ops; + const struct i915_oa_format *oa_formats; + int n_builtin_sets; } oa; } perf; diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 4d51586..d7a4899 100644 --- a/drivers/gpu/drm/i915/i9
[Intel-gfx] [PATCH v7 11/11] drm/i915: Add a kerneldoc summary for i915_perf.c
In particular this tries to capture for posterity some of the early challenges we had with using the core perf infrastructure in case we ever want to revisit adapting perf for device metrics. Cc: Chris Wilson Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_perf.c | 163 +++ 1 file changed, 163 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index e46cd36..501d20a 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -24,6 +24,169 @@ * Robert Bragg */ + +/** + * DOC: i915 Perf, streaming API for GPU metrics + * + * Gen graphics supports a large number of performance counters that can help + * driver and application developers understand and optimize their use of the + * GPU. + * + * This i915 perf interface enables userspace to configure and open a file + * descriptor representing a stream of GPU metrics which can then be read() as + * a stream of sample records. + * + * The interface is particularly suited to exposing buffered metrics that are + * captured by DMA from the GPU, unsynchronized with and unrelated to the CPU. + * + * Streams representing a single context are accessible to applications with a + * corresponding drm file descriptor, such that OpenGL can use the interface + * without special privileges. Access to system-wide metrics requires root + * privileges by default, unless changed via the dev.i915.perf_event_paranoid + * sysctl option. + * + * + * The interface was initially inspired by the core Perf infrastructure but + * some notable differences are: + * + * i915 perf file descriptors represent a "stream" instead of an "event"; where + * a perf event primarily corresponds to a single 64bit value, while a stream + * might sample sets of tightly-coupled counters, depending on the + * configuration. For example the Gen OA unit isn't designed to support + * orthogonal configurations of individual counters; it's configured for a set + * of related counters. Samples for an i915 perf stream capturing OA metrics + * will include a set of counter values packed in a compact HW specific format. + * The OA unit supports a number of different packing formats which can be + * selected by the user opening the stream. Perf has support for grouping + * events, but each event in the group is configured, validated and + * authenticated individually with separate system calls. + * + * i915 perf stream configurations are provided as an array of u64 (key,value) + * pairs, instead of a fixed struct with multiple miscellaneous config members, + * interleaved with event-type specific members. + * + * i915 perf doesn't support exposing metrics via an mmap'd circular buffer. + * The supported metrics are being written to memory by the GPU unsynchronized + * with the CPU, using HW specific packing formats for counter sets. Sometimes + * the constraints on HW configuration require reports to be filtered before it + * would be acceptable to expose them to unprivileged applications - to hide + * the metrics of other processes/contexts. For these use cases a read() based + * interface is a good fit, and provides an opportunity to filter data as it + * gets copied from the GPU mapped buffers to userspace buffers. + * + * + * Some notes regarding Linux Perf: + * + * + * The first prototype of this driver was based on the core perf + * infrastructure, and while we did make that mostly work, with some changes to + * perf, we found we were breaking or working around too many assumptions baked + * into perf's currently cpu centric design. + * + * In the end we didn't see a clear benefit to making perf's implementation and + * interface more complex by changing design assumptions while we knew we still + * wouldn't be able to use any existing perf based userspace tools. + * + * Also considering the Gen specific nature of the Observability hardware and + * how userspace will sometimes need to combine i915 perf OA metrics with + * side-band OA data captured via MI_REPORT_PERF_COUNT commands; we're + * expecting the interface to be used by a platform specific userspace such as + * OpenGL or tools. This is to say; we aren't inherently missing out on having + * a standard vendor/architecture agnostic interface by not using perf. + * + * + * For posterity, in case we might re-visit trying to adapt core perf to be + * better suited to exposing i915 metrics these were the main pain points we + * hit: + * + * - The perf based OA PMU driver broke some significant design assumptions: + * + * Existing perf pmus are used for profiling work on a cpu and we were + * introducing the idea of _IS_DEVICE pmus with different security + * implications, the need to fake cpu-related data (such as user/kernel + * registers) to fit with perf's current design, and addi
[Intel-gfx] [PATCH v7 10/11] drm/i915: Add more Haswell OA metric sets
This adds 'compute', 'compute extended', 'memory reads', 'memory writes' and 'sampler balance' metric sets for Haswell. The code is auto generated from an XML description of metric sets, currently maintained in gputop, ref: https://github.com/rib/gputop > gputop-data/oa-*.xml > scripts/i915-perf-kernelgen.py $ make -C gputop-data -f Makefile.xml Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_oa_hsw.c | 559 - 1 file changed, 558 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c b/drivers/gpu/drm/i915/i915_oa_hsw.c index 19f272b..cd2a23a 100644 --- a/drivers/gpu/drm/i915/i915_oa_hsw.c +++ b/drivers/gpu/drm/i915/i915_oa_hsw.c @@ -31,9 +31,14 @@ enum metric_set_id { METRIC_SET_ID_RENDER_BASIC = 1, + METRIC_SET_ID_COMPUTE_BASIC, + METRIC_SET_ID_COMPUTE_EXTENDED, + METRIC_SET_ID_MEMORY_READS, + METRIC_SET_ID_MEMORY_WRITES, + METRIC_SET_ID_SAMPLER_BALANCE, }; -int i915_oa_n_builtin_metric_sets_hsw = 1; +int i915_oa_n_builtin_metric_sets_hsw = 6; static const struct i915_oa_reg b_counter_config_render_basic[] = { { _MMIO(0x2724), 0x0080 }, @@ -112,6 +117,298 @@ get_render_basic_mux_config(struct drm_i915_private *dev_priv, return mux_config_render_basic; } +static const struct i915_oa_reg b_counter_config_compute_basic[] = { + { _MMIO(0x2710), 0x }, + { _MMIO(0x2714), 0x0080 }, + { _MMIO(0x2718), 0x }, + { _MMIO(0x271c), 0x }, + { _MMIO(0x2720), 0x }, + { _MMIO(0x2724), 0x0080 }, + { _MMIO(0x2728), 0x }, + { _MMIO(0x272c), 0x }, + { _MMIO(0x2740), 0x }, + { _MMIO(0x2744), 0x }, + { _MMIO(0x2748), 0x }, + { _MMIO(0x274c), 0x }, + { _MMIO(0x2750), 0x }, + { _MMIO(0x2754), 0x }, + { _MMIO(0x2758), 0x }, + { _MMIO(0x275c), 0x }, + { _MMIO(0x236c), 0x }, +}; + +static const struct i915_oa_reg mux_config_compute_basic[] = { + { _MMIO(0x253a4), 0x }, + { _MMIO(0x2681c), 0x01f00800 }, + { _MMIO(0x26820), 0x1000 }, + { _MMIO(0x2781c), 0x01f00800 }, + { _MMIO(0x26520), 0x0007 }, + { _MMIO(0x265a0), 0x0007 }, + { _MMIO(0x25380), 0x0010 }, + { _MMIO(0x2538c), 0x0030 }, + { _MMIO(0x25384), 0xaa8a }, + { _MMIO(0x25404), 0x }, + { _MMIO(0x26800), 0x4202 }, + { _MMIO(0x26808), 0x00605817 }, + { _MMIO(0x2680c), 0x10001005 }, + { _MMIO(0x26804), 0x }, + { _MMIO(0x27800), 0x0102 }, + { _MMIO(0x27808), 0x0c0701e0 }, + { _MMIO(0x2780c), 0x000200a0 }, + { _MMIO(0x27804), 0x }, + { _MMIO(0x26484), 0x4400 }, + { _MMIO(0x26704), 0x4400 }, + { _MMIO(0x26500), 0x0006 }, + { _MMIO(0x26510), 0x0001 }, + { _MMIO(0x26504), 0x8800 }, + { _MMIO(0x26580), 0x0006 }, + { _MMIO(0x26590), 0x0020 }, + { _MMIO(0x26584), 0x }, + { _MMIO(0x26104), 0x5582 }, + { _MMIO(0x26184), 0xaa86 }, + { _MMIO(0x25420), 0x08320c83 }, + { _MMIO(0x25424), 0x06820c83 }, + { _MMIO(0x2541c), 0x }, + { _MMIO(0x25428), 0x0c03 }, +}; + +static const struct i915_oa_reg * +get_compute_basic_mux_config(struct drm_i915_private *dev_priv, +int *len) +{ + *len = ARRAY_SIZE(mux_config_compute_basic); + return mux_config_compute_basic; +} + +static const struct i915_oa_reg b_counter_config_compute_extended[] = { + { _MMIO(0x2724), 0xf080 }, + { _MMIO(0x2720), 0x }, + { _MMIO(0x2714), 0xf080 }, + { _MMIO(0x2710), 0x }, + { _MMIO(0x2770), 0x0007fe2a }, + { _MMIO(0x2774), 0xff00 }, + { _MMIO(0x2778), 0x0007fe6a }, + { _MMIO(0x277c), 0xff00 }, + { _MMIO(0x2780), 0x0007fe92 }, + { _MMIO(0x2784), 0xff00 }, + { _MMIO(0x2788), 0x0007fea2 }, + { _MMIO(0x278c), 0xff00 }, + { _MMIO(0x2790), 0x0007fe32 }, + { _MMIO(0x2794), 0xff00 }, + { _MMIO(0x2798), 0x0007fe9a }, + { _MMIO(0x279c), 0xff00 }, + { _MMIO(0x27a0), 0x0007ff23 }, + { _MMIO(0x27a4), 0xff00 }, + { _MMIO(0x27a8), 0x0007fff3 }, + { _MMIO(0x27ac), 0xfffe }, +}; + +static const struct i915_oa_reg mux_config_compute_extended[] = { + { _MMIO(0x2681c), 0x3eb00800 }, + { _MMIO(0x26820), 0x0090 }, + { _MMIO(0x25384), 0x02aa }, + { _MMIO(0x25404), 0x03ff }, + { _MMIO(0x26800), 0x00142284 }, + { _MMIO(0x26808), 0x0e629062 }, + { _MMIO(0x2680c), 0x3f6f55cb }, + { _MMIO(0x26810), 0x0014 }, +
Re: [Intel-gfx] [PATCH v7 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit
On Tue, Oct 25, 2016 at 10:35 PM, Matthew Auld < matthew.william.a...@gmail.com> wrote: > On 25 October 2016 at 00:19, Robert Bragg wrote: > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h > b/drivers/gpu/drm/i915/i915_drv.h > > index 3448d05..ea24814 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.h > > +++ b/drivers/gpu/drm/i915/i915_drv.h > > @@ -1764,6 +1764,11 @@ struct intel_wm_config { > > > > > struct drm_i915_private { > > @@ -2149,16 +2164,46 @@ struct drm_i915_private { > > > > struct { > > bool initialized; > > + > > struct mutex lock; > > struct list_head streams; > > > > + spinlock_t hook_lock; > > + > > struct { > > - u32 metrics_set; > > + struct i915_perf_stream *exclusive_stream; > > + > > + u32 specific_ctx_id; > Can we just get rid of this, now that the vma remains pinned we can > simply get the ggtt address at the time of configuring the OA_CONTROL > register ? > I considered that, but would ideally prefer to keep it considering the gen8+ patches to come. For gen8+ (with execlists) the context ID isn't a gtt offset. > > > + > > + struct hrtimer poll_check_timer; > > + wait_queue_head_t poll_wq; > > + atomic_t pollin; > > + > > > > +/* The maximum exponent the hardware accepts is 63 (essentially it > selects one > > + * of the 64bit timestamp bits to trigger reports from) but there's > currently > > + * no known use case for sampling as infrequently as once per 47 > thousand years. > > + * > > + * Since the timestamps included in OA reports are only 32bits it seems > > + * reasonable to limit the OA exponent where it's still possible to > account for > > + * overflow in OA report timestamps. > > + */ > > +#define OA_EXPONENT_MAX 31 > > + > > +#define INVALID_CTX_ID 0x > We shouldn't need this anymore. > yeah I removed it and then added it back, just for the sake of explicitly setting the specific_ctx_id to an invalid ID when closing the exclusive stream - though resetting the value isn't strictly necessary. also maybe your comment is assuming specific_ctx_id can be removed, while I'd prefer to keep it. > > + > > +static int claim_specific_ctx(struct i915_perf_stream *stream) > > +{ > pin_oa_specific_ctx, or something? Also would it not make more sense > to operate on the context, not the stream. > Yeah, I avoided a name like that mainly because it's also initializing specific_ctx_id, which seemed to me like it would become an unexpected side effect with that more specific name. The other consideration is that in my gen8+ patches the pinning code is conditional depending on whether execlists are enabled, while the function still initializes specific_ctx_id. Certainly not attached to the names though. Chris has some feedback with the code, so maybe that will affect this too. > > + struct drm_i915_private *dev_priv = stream->dev_priv; > > + struct i915_vma *vma; > > + int ret; > > + > > + ret = i915_mutex_lock_interruptible(&dev_priv->drm); > > + if (ret) > > + return ret; > > + > > + /* So that we don't have to worry about updating the context ID > > +* in OACONTOL on the fly we make sure to pin the context > > +* upfront for the lifetime of the stream... > > +*/ > > + vma = stream->ctx->engine[RCS].state; > > + ret = i915_vma_pin(vma, 0, stream->ctx->ggtt_alignment, > > + PIN_GLOBAL | PIN_HIGH); > > + if (ret) > > + return ret; > > + > > + dev_priv->perf.oa.specific_ctx_id = i915_ggtt_offset(vma); > > + > > + mutex_unlock(&dev_priv->drm.struct_mutex); > > + > > + return 0; > > +} > I'll also follow up on the other notes; thanks! - Robert ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v7 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit
On 26 Oct 2016 11:08 a.m., "Matthew Auld" wrote: > > On 26 October 2016 at 00:51, Robert Bragg wrote: > > > > > > On Tue, Oct 25, 2016 at 10:35 PM, Matthew Auld > > wrote: > >> > >> On 25 October 2016 at 00:19, Robert Bragg wrote: > > > > > >> > >> > >> > diff --git a/drivers/gpu/drm/i915/i915_drv.h > >> > b/drivers/gpu/drm/i915/i915_drv.h > >> > index 3448d05..ea24814 100644 > >> > --- a/drivers/gpu/drm/i915/i915_drv.h > >> > +++ b/drivers/gpu/drm/i915/i915_drv.h > >> > @@ -1764,6 +1764,11 @@ struct intel_wm_config { > >> > >> > > >> > struct drm_i915_private { > >> > @@ -2149,16 +2164,46 @@ struct drm_i915_private { > >> > > >> > struct { > >> > bool initialized; > >> > + > >> > struct mutex lock; > >> > struct list_head streams; > >> > > >> > + spinlock_t hook_lock; > >> > + > >> > struct { > >> > - u32 metrics_set; > >> > + struct i915_perf_stream *exclusive_stream; > >> > + > >> > + u32 specific_ctx_id; > >> Can we just get rid of this, now that the vma remains pinned we can > >> simply get the ggtt address at the time of configuring the OA_CONTROL > >> register ? > > > > > > I considered that, but would ideally prefer to keep it considering the gen8+ > > patches to come. For gen8+ (with execlists) the context ID isn't a gtt > > offset. > > > >> > >> > >> > + > >> > + struct hrtimer poll_check_timer; > >> > + wait_queue_head_t poll_wq; > >> > + atomic_t pollin; > >> > + > >> > > > >> > >> > +/* The maximum exponent the hardware accepts is 63 (essentially it > >> > selects one > >> > + * of the 64bit timestamp bits to trigger reports from) but there's > >> > currently > >> > + * no known use case for sampling as infrequently as once per 47 > >> > thousand years. > >> > + * > >> > + * Since the timestamps included in OA reports are only 32bits it seems > >> > + * reasonable to limit the OA exponent where it's still possible to > >> > account for > >> > + * overflow in OA report timestamps. > >> > + */ > >> > +#define OA_EXPONENT_MAX 31 > >> > + > >> > +#define INVALID_CTX_ID 0x > >> We shouldn't need this anymore. > > > > > > yeah I removed it and then added it back, just for the sake of explicitly > > setting the specific_ctx_id to an invalid ID when closing the exclusive > > stream - though resetting the value isn't strictly necessary. > Can we not make the specific_ctx_id per-stream, the gem context > already is, then we don't need to be concerned with resetting it ? Hmm, I'm not sure about that, conceptually to me it's global OA unit state. Currently the driver only supports a single exclusive stream, while Sourab later relaxes that to a per-engine stream and that could be relaxed further with non-oa metric stream types. With multiple streams we'll still only be able to programmer a single ctx id in oacontol. Conceptually to me, other stream types could be associated with different contexts (if they don't depend on the OA unit) so to me stream->ctx isn't necessarily OA unit state. It probably could be played around with, but right now we don't track OA specific state in the stream. For the ID it's just semantics to say it's OA state, and we could consider that it's maybe generally useful to track the ID, even for future non-oa streams. That might mean potentially redundantly pinning state for the sake of tracking the ID for streams that don't end up needing it. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v7 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit
On 26 Oct 2016 9:54 a.m., "Chris Wilson" wrote: > > On Wed, Oct 26, 2016 at 12:51:58AM +0100, Robert Bragg wrote: > >On Tue, Oct 25, 2016 at 10:35 PM, Matthew Auld > ><[1]matthew.william.a...@gmail.com> wrote: > > > > On 25 October 2016 at 00:19, Robert Bragg <[2]rob...@sixbynine.org> > > wrote: > > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h > > b/drivers/gpu/drm/i915/i915_drv.h > > > index 3448d05..ea24814 100644 > > > --- a/drivers/gpu/drm/i915/i915_drv.h > > > +++ b/drivers/gpu/drm/i915/i915_drv.h > > > @@ -1764,6 +1764,11 @@ struct intel_wm_config { > > > > > > > > struct drm_i915_private { > > > @@ -2149,16 +2164,46 @@ struct drm_i915_private { > > > > > > struct { > > > bool initialized; > > > + > > > struct mutex lock; > > > struct list_head streams; > > > > > > + spinlock_t hook_lock; > > > + > > > struct { > > > - u32 metrics_set; > > > + struct i915_perf_stream *exclusive_stream; > > > + > > > + u32 specific_ctx_id; > > Can we just get rid of this, now that the vma remains pinned we can > > simply get the ggtt address at the time of configuring the OA_CONTROL > > register ? > > > >I considered that, but would ideally prefer to keep it considering the > >gen8+ patches to come. For gen8+ (with execlists) the context ID isn't a > >gtt offset. > > In terms of symmetry, keeping the vma you pinned and unpinning the same > later makes its ownership much clearer. (And I do want the owner of each > pin to be clear, for when we start enabling debug to catch the VMA > leaks.) Keeping our own pointer to the pinned vma could be a clarification. Considering Matt's comments too, I'm thinking I'll put the pinning and specific_ctx_id initialization together with setting stream->ctx, keeping the state together under the stream. It's going to potentially mean redundantly pinning the ctx for the sake of the ID in the future for streams that don't really need it, but I think it's probably not worth worrying about that. - Robert > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH igt 0/3] corresponding changes for i915-perf interface
The i915-perf series affects the command parser and itself includes new uapi which these i-g-t changes try to cover. - Robert Robert Bragg (3): igt/perf: add i915 perf stream tests for Haswell igt/gem_exec_parse: remove oacontrol checks igt/gem_exec_parse: update for version 8 changes tests/Makefile.sources |1 + tests/gem_exec_parse.c | 436 +- tests/perf.c | 2173 3 files changed, 2364 insertions(+), 246 deletions(-) create mode 100644 tests/perf.c -- 2.10.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH igt 1/3] igt/perf: add i915 perf stream tests for Haswell
Signed-off-by: Robert Bragg --- tests/Makefile.sources |1 + tests/perf.c | 2173 2 files changed, 2174 insertions(+) create mode 100644 tests/perf.c diff --git a/tests/Makefile.sources b/tests/Makefile.sources index 6d081c3..7c6de2f 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -211,6 +211,7 @@ TESTS_progs = \ kms_pwrite_crc \ kms_sink_crc_basic \ prime_udl \ + perf \ $(NULL) # IMPORTANT: The ZZ_ tests need to be run last! diff --git a/tests/perf.c b/tests/perf.c new file mode 100644 index 000..f050e31 --- /dev/null +++ b/tests/perf.c @@ -0,0 +1,2173 @@ +/* + * Copyright © 2016 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "igt.h" +#include "drm.h" + +IGT_TEST_DESCRIPTION("Test the i915 perf metrics streaming interface"); + +#define GEN6_MI_REPORT_PERF_COUNT ((0x28 << 23) | (3 - 2)) + +#define GFX_OP_PIPE_CONTROL ((3 << 29) | (3 << 27) | (2 << 24)) +#define PIPE_CONTROL_CS_STALL (1 << 20) +#define PIPE_CONTROL_GLOBAL_SNAPSHOT_COUNT_RESET(1 << 19) +#define PIPE_CONTROL_TLB_INVALIDATE (1 << 18) +#define PIPE_CONTROL_SYNC_GFDT (1 << 17) +#define PIPE_CONTROL_MEDIA_STATE_CLEAR (1 << 16) +#define PIPE_CONTROL_NO_WRITE (0 << 14) +#define PIPE_CONTROL_WRITE_IMMEDIATE(1 << 14) +#define PIPE_CONTROL_WRITE_DEPTH_COUNT (2 << 14) +#define PIPE_CONTROL_WRITE_TIMESTAMP(3 << 14) +#define PIPE_CONTROL_DEPTH_STALL(1 << 13) +#define PIPE_CONTROL_RENDER_TARGET_FLUSH (1 << 12) +#define PIPE_CONTROL_INSTRUCTION_INVALIDATE (1 << 11) +#define PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE (1 << 10) /* GM45+ only */ +#define PIPE_CONTROL_ISP_DIS(1 << 9) +#define PIPE_CONTROL_INTERRUPT_ENABLE (1 << 8) +#define PIPE_CONTROL_FLUSH_ENABLE (1 << 7) /* Gen7+ only */ +/* GT */ +#define PIPE_CONTROL_DATA_CACHE_INVALIDATE (1 << 5) +#define PIPE_CONTROL_VF_CACHE_INVALIDATE(1 << 4) +#define PIPE_CONTROL_CONST_CACHE_INVALIDATE (1 << 3) +#define PIPE_CONTROL_STATE_CACHE_INVALIDATE (1 << 2) +#define PIPE_CONTROL_STALL_AT_SCOREBOARD(1 << 1) +#define PIPE_CONTROL_DEPTH_CACHE_FLUSH (1 << 0) +#define PIPE_CONTROL_PPGTT_WRITE(0 << 2) +#define PIPE_CONTROL_GLOBAL_GTT_WRITE (1 << 2) + +static struct { +const char *name; +uint64_t id; +size_t size; +int a_off; /* bytes */ +int n_a; +int first_a; +int b_off; +int n_b; +int c_off; +int n_c; +} hsw_oa_formats[] = { +{ "A13", I915_OA_FORMAT_A13, .size = 64, +.a_off = 12, .n_a = 13 }, +{ "A29", I915_OA_FORMAT_A29, .size = 128, +.a_off = 12, .n_a = 29 }, +{ "A13_B8_C8", I915_OA_FORMAT_A13_B8_C8, .size = 128, +.a_off = 12, .n_a = 13, +.b_off = 64, .n_b = 8, +.c_off = 96, .n_c = 8 }, +{ "A45_B8_C8", I915_OA_FORMAT_A45_B8_C8, .size = 256, +.a_off = 12, .n_a = 45, +.b_off = 192, .n_b = 8, +.c_off = 224, .n_c = 8 }, +{ "B4_C8", I915_OA_FORMAT_B4_C8, .size = 64, +.b_off = 16, .n_b = 4, +.c_off = 32, .n_c = 8 }, +{ "B4_C8_A16", I915_OA_FORMAT_B4_C8_A16, .size = 128, +.b_off = 16, .n_b = 4, +.c_off = 32, .n_c = 8,
[Intel-gfx] [PATCH igt 3/3] igt/gem_exec_parse: update for version 8 changes
This adapts the tests to account for the parser no longer reporting privilege violations back to userspace as EINVAL errors (they are left to the HW command parser to squash the commands to NOOPS). The interface change isn't expected to affect userspace and in fact it looks like the previous behaviour was liable to break userspace, such as Mesa which explicitly tries to observe whether OACONTROL LRIs are squashed to NOOPs but Mesa will abort for execbuffer errors. Signed-off-by: Robert Bragg --- tests/gem_exec_parse.c | 368 +++-- 1 file changed, 200 insertions(+), 168 deletions(-) diff --git a/tests/gem_exec_parse.c b/tests/gem_exec_parse.c index 36bf57d..4290701 100644 --- a/tests/gem_exec_parse.c +++ b/tests/gem_exec_parse.c @@ -34,7 +34,24 @@ #define I915_PARAM_CMD_PARSER_VERSION 28 #endif +#define ARRAY_LEN(A) (sizeof(A) / sizeof(A[0])) + +#define OACONTROL 0x2360 #define DERRMR 0x44050 +#define SO_WRITE_OFFSET_0 0x5280 +#define HSW_CS_GPR(n) (0x2600 + 8*(n)) +#define HSW_CS_GPR0 HSW_CS_GPR(0) +#define HSW_CS_GPR1 HSW_CS_GPR(1) + +#define MI_LOAD_REGISTER_REG (0x2a << 23) +#define MI_STORE_REGISTER_MEM (0x24 << 23) +#define MI_ARB_ON_OFF (0x8 << 23) +#define MI_DISPLAY_FLIP ((0x14 << 23) | 1) + +#define GFX_OP_PIPE_CONTROL((0x3<<29)|(0x3<<27)|(0x2<<24)|2) +#define PIPE_CONTROL_QW_WRITE(1<<14) +#define PIPE_CONTROL_LRI_POST_OP (1<<23) + static int command_parser_version(int fd) { @@ -50,101 +67,8 @@ static int command_parser_version(int fd) return -1; } -#define HSW_CS_GPR(n) (0x2600 + 8*(n)) -#define HSW_CS_GPR0 HSW_CS_GPR(0) -#define HSW_CS_GPR1 HSW_CS_GPR(1) - -#define MI_LOAD_REGISTER_REG (0x2a << 23) -#define MI_STORE_REGISTER_MEM (0x24 << 23) -static void hsw_load_register_reg(void) -{ - uint32_t buf[16] = { - MI_LOAD_REGISTER_IMM | (5 - 2), - HSW_CS_GPR0, - 0xabcdabcd, - HSW_CS_GPR1, - 0xdeadbeef, - - MI_STORE_REGISTER_MEM | (3 - 2), - HSW_CS_GPR1, - 0, /* address0 */ - - MI_LOAD_REGISTER_REG | (3 - 2), - HSW_CS_GPR0, - HSW_CS_GPR1, - - MI_STORE_REGISTER_MEM | (3 - 2), - HSW_CS_GPR1, - 4, /* address1 */ - - MI_BATCH_BUFFER_END, - }; - struct drm_i915_gem_execbuffer2 execbuf; - struct drm_i915_gem_exec_object2 obj[2]; - struct drm_i915_gem_relocation_entry reloc[2]; - int fd; - - /* Open again to get a non-master file descriptor */ - fd = drm_open_driver(DRIVER_INTEL); - - igt_require(IS_HASWELL(intel_get_drm_devid(fd))); - igt_require(command_parser_version(fd) >= 7); - - memset(obj, 0, sizeof(obj)); - obj[0].handle = gem_create(fd, 4096); - obj[1].handle = gem_create(fd, 4096); - gem_write(fd, obj[1].handle, 0, buf, sizeof(buf)); - - memset(reloc, 0, sizeof(reloc)); - reloc[0].offset = 7*sizeof(uint32_t); - reloc[0].target_handle = obj[0].handle; - reloc[0].delta = 0; - reloc[0].read_domains = I915_GEM_DOMAIN_INSTRUCTION; - reloc[0].write_domain = I915_GEM_DOMAIN_INSTRUCTION; - reloc[1].offset = 13*sizeof(uint32_t); - reloc[1].target_handle = obj[0].handle; - reloc[1].delta = sizeof(uint32_t); - reloc[1].read_domains = I915_GEM_DOMAIN_INSTRUCTION; - reloc[1].write_domain = I915_GEM_DOMAIN_INSTRUCTION; - obj[1].relocs_ptr = (uintptr_t)&reloc; - obj[1].relocation_count = 2; - - memset(&execbuf, 0, sizeof(execbuf)); - execbuf.buffers_ptr = (uintptr_t)obj; - execbuf.buffer_count = 2; - execbuf.batch_len = sizeof(buf); - execbuf.flags = I915_EXEC_RENDER; - gem_execbuf(fd, &execbuf); - gem_close(fd, obj[1].handle); - - gem_read(fd, obj[0].handle, 0, buf, 2*sizeof(buf[0])); - igt_assert_eq_u32(buf[0], 0xdeadbeef); /* before copy */ - igt_assert_eq_u32(buf[1], 0xabcdabcd); /* after copy */ - - /* Now a couple of negative tests that should be filtered */ - execbuf.buffer_count = 1; - execbuf.batch_len = 4*sizeof(buf[0]); - - buf[0] = MI_LOAD_REGISTER_REG | (3 - 2); - buf[1] = HSW_CS_GPR0; - buf[2] = 0; - buf[3] = MI_BATCH_BUFFER_END; - gem_write(fd, obj[0].handle, 0, buf, execbuf.batch_len); - igt_assert_eq(__gem_execbuf(fd, &execbuf), -EINVAL); - - buf[2] = DERRMR; /* master only */ - gem_write(fd, obj[0].handle, 0, buf, execbuf.batch_len); - igt_assert_eq(__gem_execbuf(fd, &execbuf), -EINVAL); - - buf[2] = 0x2038; /* RING_START: invalid */ - gem_write(fd, obj[0].handle, 0, buf, execbuf.batch_len); - igt_assert_eq(__gem_execbuf(fd, &execbuf), -EINVAL); - - close(fd); -} - -static void exec_b
Re: [Intel-gfx] [PATCH v7 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit
On Wed, Oct 26, 2016 at 4:37 PM, Ville Syrjälä < ville.syrj...@linux.intel.com> wrote: > On Wed, Oct 26, 2016 at 04:17:45PM +0100, Robert Bragg wrote: > > On 26 Oct 2016 9:54 a.m., "Chris Wilson" > wrote: > > > > > > On Wed, Oct 26, 2016 at 12:51:58AM +0100, Robert Bragg wrote: > > > >On Tue, Oct 25, 2016 at 10:35 PM, Matthew Auld > > > ><[1]matthew.william.a...@gmail.com> wrote: > > > > > > > > On 25 October 2016 at 00:19, Robert Bragg <[2] > rob...@sixbynine.org> > > > > wrote: > > > > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h > > > > b/drivers/gpu/drm/i915/i915_drv.h > > > > > index 3448d05..ea24814 100644 > > > > > --- a/drivers/gpu/drm/i915/i915_drv.h > > > > > +++ b/drivers/gpu/drm/i915/i915_drv.h > > > > > @@ -1764,6 +1764,11 @@ struct intel_wm_config { > > > > > > > > > > > > > > struct drm_i915_private { > > > > > @@ -2149,16 +2164,46 @@ struct drm_i915_private { > > > > > > > > > > struct { > > > > > bool initialized; > > > > > + > > > > > struct mutex lock; > > > > > struct list_head streams; > > > > > > > > > > + spinlock_t hook_lock; > > > > > + > > > > > struct { > > > > > - u32 metrics_set; > > > > > + struct i915_perf_stream > > *exclusive_stream; > > OT: > What kind of MUA are you using that mangles quoted mails like this? I've > not seen it on intel-gfx before. mesa-dev seems rife with it, but as I > rarely read that in any great detail I've managed to ignore it there. > Anyways, it makes it espesially hard to navigate long mails since mutt's > 'S' (skip quoted text) no longer works correctly. > Not sure I want to say, and get booted out the door :-) I've heard that gmail has an annoying habit of forcibly wrapping plain text emails like this, and a lot of people have complained that there's no way to disable that 'feature' :-/ I used to use Mutt, but I don't think I could really bare to go back to it any more. Last time I was using it I found myself spending too much time patching it to try and make it work how I'd like, but can't say I got much enjoyment from that process. I've tried most MUA options available, and can't say any of them make me very happy - I think these days it's just not something developers are very interesting in working on. I'm a sell out and just use Gmail... sorry. I can't really see myself changing, though I do wish Google weren't so pedantic about forcing wrapping without any option to change that behaviour. I suspect you wouldn't be happy with me sending html emails, which has been Google's default response to this complaint afik. Maybe it's gmail users causing trouble on the Mesa list too. - Robert P.S please don't think lesser of me due to my misguided MUA choices. > > > > > > + > > > > > + u32 specific_ctx_id; > > > > Can we just get rid of this, now that the vma remains pinned we > can > > > > simply get the ggtt address at the time of configuring the > > OA_CONTROL > > > > register ? > > > > > > > >I considered that, but would ideally prefer to keep it considering > > the > > > >gen8+ patches to come. For gen8+ (with execlists) the context ID > > isn't a > > > >gtt offset. > > > > > > In terms of symmetry, keeping the vma you pinned and unpinning the same > > > later makes its ownership much clearer. (And I do want the owner of > each > > > pin to be clear, for when we start enabling debug to catch the VMA > > > leaks.) > > > > Keeping our own pointer to the pinned vma could be a clarification. > > > > Considering Matt's comments too, I'm thinking I'll put the pinning and > > specific_ctx_id initialization together with setting stream->ctx, keeping > > the state together under the stream. It's going to potentially mean > > redundantly pinning the ctx for the sake of the ID in the future for > > streams that don't really need it, but I think it's probably not worth > > worrying about that. > > > > - Robert > > > > > -Chris > > > > > > -- > > > Chris Wilson, Intel Open Source Technology Centre > > > ___ > > Intel-gfx mailing list > > Intel-gfx@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx > > > -- > Ville Syrjälä > Intel OTC > ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v7 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit
On 26 Oct 2016 5:54 p.m., "Ville Syrjälä" wrote: > > On Wed, Oct 26, 2016 at 05:42:23PM +0100, Robert Bragg wrote: > > On Wed, Oct 26, 2016 at 4:37 PM, Ville Syrjälä < > > ville.syrj...@linux.intel.com> wrote: > > > > > On Wed, Oct 26, 2016 at 04:17:45PM +0100, Robert Bragg wrote: > > > > On 26 Oct 2016 9:54 a.m., "Chris Wilson" > > > wrote: > > > > > > > > > > On Wed, Oct 26, 2016 at 12:51:58AM +0100, Robert Bragg wrote: > > > > > >On Tue, Oct 25, 2016 at 10:35 PM, Matthew Auld > > > > > ><[1]matthew.william.a...@gmail.com> wrote: > > > > > > > > > > > > On 25 October 2016 at 00:19, Robert Bragg <[2] > > > rob...@sixbynine.org> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h > > > > > > b/drivers/gpu/drm/i915/i915_drv.h > > > > > > > index 3448d05..ea24814 100644 > > > > > > > --- a/drivers/gpu/drm/i915/i915_drv.h > > > > > > > +++ b/drivers/gpu/drm/i915/i915_drv.h > > > > > > > @@ -1764,6 +1764,11 @@ struct intel_wm_config { > > > > > > > > > > > > > > > > > > > > struct drm_i915_private { > > > > > > > @@ -2149,16 +2164,46 @@ struct drm_i915_private { > > > > > > > > > > > > > > struct { > > > > > > > bool initialized; > > > > > > > + > > > > > > > struct mutex lock; > > > > > > > struct list_head streams; > > > > > > > > > > > > > > + spinlock_t hook_lock; > > > > > > > + > > > > > > > struct { > > > > > > > - u32 metrics_set; > > > > > > > + struct i915_perf_stream > > > > *exclusive_stream; > > > > > > OT: > > > What kind of MUA are you using that mangles quoted mails like this? I've > > > not seen it on intel-gfx before. mesa-dev seems rife with it, but as I > > > rarely read that in any great detail I've managed to ignore it there. > > > Anyways, it makes it espesially hard to navigate long mails since mutt's > > > 'S' (skip quoted text) no longer works correctly. > > > > > > > Not sure I want to say, and get booted out the door :-) > > > > I've heard that gmail has an annoying habit of forcibly wrapping plain text > > emails like this, and a lot of people have complained that there's no way > > to disable that 'feature' :-/ > > > > I used to use Mutt, but I don't think I could really bare to go back to it > > any more. Last time I was using it I found myself spending too much time > > patching it to try and make it work how I'd like, but can't say I got much > > enjoyment from that process. > > Isn't gmail just a pile of client side javascript or something? Maybe > you'd enjoy patching that one more? ;) > > > > > I've tried most MUA options available, and can't say any of them make me > > very happy - I think these days it's just not something developers are very > > interesting in working on. > > > > I'm a sell out and just use Gmail... sorry. I can't really see myself > > changing, though I do wish Google weren't so pedantic about forcing > > wrapping without any option to change that behaviour. I suspect you > > wouldn't be happy with me sending html emails, which has been Google's > > default response to this complaint afik. > > > > Maybe it's gmail users causing trouble on the Mesa list too. > > > > - Robert > > > > P.S please don't think lesser of me due to my misguided MUA choices. > > I think I'll just reserve the right to ignore any mail with bad quoting. Okey, fwiw, at least my patches sent out via git send-email should be fine, so maybe just ignore my replies to feedback - which I promise not to exploit to achieve 'consensus' through silence. - Robert -- Sent from Gmail on Android, in a spare moment at a VR for Immersive Theatre meet up. > > -- > Ville Syrjälä > Intel OTC ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v7 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit
On Wed, Oct 26, 2016 at 4:03 PM, Robert Bragg wrote: > On 26 Oct 2016 11:08 a.m., "Matthew Auld" > wrote: > > > > On 26 October 2016 at 00:51, Robert Bragg wrote: > > > > > > > > > On Tue, Oct 25, 2016 at 10:35 PM, Matthew Auld > > > wrote: > > >> > > >> On 25 October 2016 at 00:19, Robert Bragg > wrote: > > > > > > > > >> > > >> > > >> > diff --git a/drivers/gpu/drm/i915/i915_drv.h > > >> > b/drivers/gpu/drm/i915/i915_drv.h > > >> > index 3448d05..ea24814 100644 > > >> > --- a/drivers/gpu/drm/i915/i915_drv.h > > >> > +++ b/drivers/gpu/drm/i915/i915_drv.h > > >> > @@ -1764,6 +1764,11 @@ struct intel_wm_config { > > >> > > >> > > > >> > struct drm_i915_private { > > >> > @@ -2149,16 +2164,46 @@ struct drm_i915_private { > > >> > > > >> > struct { > > >> > bool initialized; > > >> > + > > >> > struct mutex lock; > > >> > struct list_head streams; > > >> > > > >> > + spinlock_t hook_lock; > > >> > + > > >> > struct { > > >> > - u32 metrics_set; > > >> > + struct i915_perf_stream *exclusive_stream; > > >> > + > > >> > + u32 specific_ctx_id; > > >> Can we just get rid of this, now that the vma remains pinned we can > > >> simply get the ggtt address at the time of configuring the OA_CONTROL > > >> register ? > > > > > > > > > I considered that, but would ideally prefer to keep it considering the > gen8+ > > > patches to come. For gen8+ (with execlists) the context ID isn't a gtt > > > offset. > > > > > >> > > >> > > >> > + > > >> > + struct hrtimer poll_check_timer; > > >> > + wait_queue_head_t poll_wq; > > >> > + atomic_t pollin; > > >> > + > > >> > > > > > >> > > >> > +/* The maximum exponent the hardware accepts is 63 (essentially it > > >> > selects one > > >> > + * of the 64bit timestamp bits to trigger reports from) but there's > > >> > currently > > >> > + * no known use case for sampling as infrequently as once per 47 > > >> > thousand years. > > >> > + * > > >> > + * Since the timestamps included in OA reports are only 32bits it > seems > > >> > + * reasonable to limit the OA exponent where it's still possible to > > >> > account for > > >> > + * overflow in OA report timestamps. > > >> > + */ > > >> > +#define OA_EXPONENT_MAX 31 > > >> > + > > >> > +#define INVALID_CTX_ID 0x > > >> We shouldn't need this anymore. > > > > > > > > > yeah I removed it and then added it back, just for the sake of > explicitly > > > setting the specific_ctx_id to an invalid ID when closing the exclusive > > > stream - though resetting the value isn't strictly necessary. > > Can we not make the specific_ctx_id per-stream, the gem context > > already is, then we don't need to be concerned with resetting it ? > > Hmm, I'm not sure about that, conceptually to me it's global OA unit state. > > Currently the driver only supports a single exclusive stream, while Sourab > later relaxes that to a per-engine stream and that could be relaxed further > with non-oa metric stream types. > > With multiple streams we'll still only be able to programmer a single ctx > id in oacontol. > > Conceptually to me, other stream types could be associated with different > contexts (if they don't depend on the OA unit) so to me stream->ctx isn't > necessarily OA unit state. > > It probably could be played around with, but right now we don't track OA > specific state in the stream. For the ID it's just semantics to say it's OA > state, and we could consider that it's maybe generally useful to track the > ID, even for future non-oa streams. That might mean potentially redundantly > pinning state for the sake of tracking the ID for streams that don't end up > needing it. > I started to try out moving the specific_ctx_id and vma pointer (new) to the stream, and also looked at initializing them together with the stream->ctx reference, but I'm not really happy with how it's looking. The specific_ctx_id and pinning are only for the render context, since the OA unit is only well integrated with the render engine, which makes me more inclined to consider them OA stream specific, not something we want/need for all streams (considering that Sourab enables multiple streams in his series). Btw, for reference, my patches for gen8+ can also end up making use of the INVALID_CTX_ID define (when overwriting the undefined ctx_id field in HW reports when the report's ctx-id is flagged as invalid by the OA unit.) so we maybe don't want to worry to much about removing the need for it here. - Robert ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH igt 2/3] igt/gem_exec_parse: remove oacontrol checks
The command parser no longer whitelists or does anything special for the OACONTROL register which is now considered owned by i915-perf. As a follow up the plan is to at least check that attempting to write to OACONTROL from userspace must not fail with an EINVAL error, otherwise Mesa's graceful fallback path for not being able to write to OACONTROL via LRI commands will cause Mesa applications to abort(). Signed-off-by: Robert Bragg --- tests/gem_exec_parse.c | 88 -- 1 file changed, 88 deletions(-) diff --git a/tests/gem_exec_parse.c b/tests/gem_exec_parse.c index a39db3e..36bf57d 100644 --- a/tests/gem_exec_parse.c +++ b/tests/gem_exec_parse.c @@ -34,7 +34,6 @@ #define I915_PARAM_CMD_PARSER_VERSION 28 #endif -#define OACONTROL 0x2360 #define DERRMR 0x44050 static int command_parser_version(int fd) @@ -133,10 +132,6 @@ static void hsw_load_register_reg(void) gem_write(fd, obj[0].handle, 0, buf, execbuf.batch_len); igt_assert_eq(__gem_execbuf(fd, &execbuf), -EINVAL); - buf[2] = OACONTROL; /* filtered */ - gem_write(fd, obj[0].handle, 0, buf, execbuf.batch_len); - igt_assert_eq(__gem_execbuf(fd, &execbuf), -EINVAL); - buf[2] = DERRMR; /* master only */ gem_write(fd, obj[0].handle, 0, buf, execbuf.batch_len); igt_assert_eq(__gem_execbuf(fd, &execbuf), -EINVAL); @@ -385,29 +380,6 @@ static void exec_batch_chained(int fd, uint32_t cmd_bo, uint32_t *cmds, gem_close(fd, target_bo); } -static void stray_lri(int fd, uint32_t handle) -{ - /* Ideally this would test all once whitelisted registers */ - uint32_t lri[] = { - MI_LOAD_REGISTER_IMM, - OACONTROL, - 0x31337000, - MI_BATCH_BUFFER_END, - }; - int err; - - igt_assert_eq_u32(intel_register_read(OACONTROL), 0xdeadbeef); - - err = __exec_batch(fd, handle, lri, sizeof(lri), I915_EXEC_RENDER); - if (err == -EINVAL) - return; - - igt_assert_eq(err, 0); - gem_sync(fd, handle); - - igt_assert_eq_u32(intel_register_read(OACONTROL), 0xdeadbeef); -} - uint32_t handle; int fd; @@ -486,23 +458,6 @@ igt_main -EINVAL); } - igt_subtest_group { - igt_fixture { - intel_register_access_init(intel_get_pci_device(), 0); - - intel_register_write(OACONTROL, 0xdeadbeef); - igt_assert_eq_u32(intel_register_read(OACONTROL), 0xdeadbeef); - } - - igt_subtest("basic-stray-lri") - stray_lri(fd, handle); - - igt_fixture { - intel_register_write(OACONTROL, 0); - intel_register_access_fini(); - } - } - igt_subtest("registers") { uint32_t lri_bad[] = { MI_LOAD_REGISTER_IMM, @@ -563,49 +518,6 @@ igt_main 0); } - igt_subtest("oacontrol-tracking") { - uint32_t lri_ok[] = { - MI_LOAD_REGISTER_IMM, - OACONTROL, - 0x31337000, - MI_LOAD_REGISTER_IMM, - OACONTROL, - 0x0, - MI_BATCH_BUFFER_END, - 0 - }; - uint32_t lri_bad[] = { - MI_LOAD_REGISTER_IMM, - OACONTROL, - 0x31337000, - MI_BATCH_BUFFER_END, - }; - uint32_t lri_extra_bad[] = { - MI_LOAD_REGISTER_IMM, - OACONTROL, - 0x31337000, - MI_LOAD_REGISTER_IMM, - OACONTROL, - 0x0, - MI_LOAD_REGISTER_IMM, - OACONTROL, - 0x31337000, - MI_BATCH_BUFFER_END, - }; - exec_batch(fd, handle, - lri_ok, sizeof(lri_ok), - I915_EXEC_RENDER, - 0); - exec_batch(fd, handle, - lri_bad, sizeof(lri_bad), - I915_EXEC_RENDER, - -EINVAL); - exec_batch(fd, handle, - lri_extra_bad, sizeof(lri_extra_bad), - I915_EXEC_RENDER, - -EINVAL); - } - igt_subtest("chained-batch") { uint32_t pc[] = { GFX_OP_PIPE_CONTROL, -- 2.10.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v7 05/11] drm/i915: Add 'render basic' Haswell OA unit config
Adds a static OA unit, MUX + B Counter configuration for basic render metrics on Haswell. This is auto generated from an XML description of metric sets, currently maintained in gputop, ref: https://github.com/rib/gputop > gputop-data/oa-*.xml > scripts/i915-perf-kernelgen.py $ make -C gputop-data -f Makefile.xml SYSFS=0 WHITELIST=RenderBasic Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/Makefile | 3 +- drivers/gpu/drm/i915/i915_drv.h| 14 drivers/gpu/drm/i915/i915_oa_hsw.c | 144 + drivers/gpu/drm/i915/i915_oa_hsw.h | 34 + 4 files changed, 194 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.c create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 8d4e25f..ac0c3ad 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -114,7 +114,8 @@ i915-$(CONFIG_DRM_I915_CAPTURE_ERROR) += i915_gpu_error.o i915-y += i915_vgpu.o # perf code -i915-y += i915_perf.o +i915-y += i915_perf.o \ + i915_oa_hsw.o ifeq ($(CONFIG_DRM_I915_GVT),y) i915-y += intel_gvt.o diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index fcc5958..3448d05 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1764,6 +1764,11 @@ struct intel_wm_config { bool sprites_scaled; }; +struct i915_oa_reg { + i915_reg_t addr; + u32 value; +}; + struct i915_perf_stream; struct i915_perf_stream_ops { @@ -2146,6 +2151,15 @@ struct drm_i915_private { bool initialized; struct mutex lock; struct list_head streams; + + struct { + u32 metrics_set; + + const struct i915_oa_reg *mux_regs; + int mux_regs_len; + const struct i915_oa_reg *b_counter_regs; + int b_counter_regs_len; + } oa; } perf; /* Abstract the submission mechanism (legacy ringbuffer or execlists) away */ diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c b/drivers/gpu/drm/i915/i915_oa_hsw.c new file mode 100644 index 000..8906380 --- /dev/null +++ b/drivers/gpu/drm/i915/i915_oa_hsw.c @@ -0,0 +1,144 @@ +/* + * Autogenerated file, DO NOT EDIT manually! + * + * Copyright (c) 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + */ + +#include "i915_drv.h" +#include "i915_oa_hsw.h" + +enum metric_set_id { + METRIC_SET_ID_RENDER_BASIC = 1, +}; + +int i915_oa_n_builtin_metric_sets_hsw = 1; + +static const struct i915_oa_reg b_counter_config_render_basic[] = { + { _MMIO(0x2724), 0x0080 }, + { _MMIO(0x2720), 0x }, + { _MMIO(0x2714), 0x0080 }, + { _MMIO(0x2710), 0x }, +}; + +static const struct i915_oa_reg mux_config_render_basic[] = { + { _MMIO(0x253a4), 0x0160 }, + { _MMIO(0x25440), 0x0010 }, + { _MMIO(0x25128), 0x }, + { _MMIO(0x2691c), 0x0800 }, + { _MMIO(0x26aa0), 0x0150 }, + { _MMIO(0x26b9c), 0x6000 }, + { _MMIO(0x2791c), 0x0800 }, + { _MMIO(0x27aa0), 0x0150 }, + { _MMIO(0x27b9c), 0x6000 }, + { _MMIO(0x2641c), 0x0400 }, + { _MMIO(0x25380), 0x0010 }, + { _MMIO(0x2538c), 0x }, + { _MMIO(0x25384), 0x0800 }, + { _MMIO(0x25400), 0x0004 }, + { _MMIO(0x2540c), 0x06029000 }, + { _MMIO(0x25410), 0x0002 }, + { _MMIO(0x25404), 0x5c30 }, + { _MMIO(0x25100), 0x0016 }, + { _MMIO(0x25110), 0x0400 }, + { _MMIO(0x25104), 0x }, + { _MMIO(0x26804), 0x1211 }, + { _MMIO(0
[Intel-gfx] [PATCH v7 00/11] Enable i915 perf stream for Haswell OA unit
Rebased on nightly, including recent review updates (CI wasn't happy picking up the replies updating individual patches). This also reverts back to pinning the context upfront when opening a stream for a single context, instead of hooking into pinning and updating OACONTROL on the fly. Chris has repeatedly suggested he'd prefer to have the driver work with an upfront pin, as it used to, instead of with the hook. It was changed last time based on feedback considering some concern with the shrinker. At least from inspection it does /seem/ safe to assume a pinned vma will reliably block the shrinker from freeing ctx pages and the shrinker itself doesn't unpin things. I'm not fully certain of the interaction with the _gem.c _context_lost() code path which aims to unpin last_context. At least the code is a little simpler this way, so maybe if Daniel is happy that his original concern was overly cautious (or no longer an issue with the latest code), then this change is ok. - Robert Robert Bragg (11): drm/i915: Add i915 perf infrastructure drm/i915: rename OACONTROL GEN7_OACONTROL drm/i915: return EACCES for check_cmd() failures drm/i915: don't whitelist oacontrol in cmd parser drm/i915: Add 'render basic' Haswell OA unit config drm/i915: Enable i915 perf stream for Haswell OA unit drm/i915: advertise available metrics via sysfs drm/i915: Add dev.i915.perf_stream_paranoid sysctl option drm/i915: add oa_event_min_timer_exponent sysctl drm/i915: Add more Haswell OA metric sets drm/i915: Add a kerneldoc summary for i915_perf.c drivers/gpu/drm/i915/Makefile |4 + drivers/gpu/drm/i915/gvt/handlers.c|2 +- drivers/gpu/drm/i915/i915_cmd_parser.c | 45 +- drivers/gpu/drm/i915/i915_drv.c|9 + drivers/gpu/drm/i915/i915_drv.h| 155 +++ drivers/gpu/drm/i915/i915_oa_hsw.c | 752 ++ drivers/gpu/drm/i915/i915_oa_hsw.h | 38 + drivers/gpu/drm/i915/i915_perf.c | 1689 drivers/gpu/drm/i915/i915_reg.h| 340 ++- include/uapi/drm/i915_drm.h| 133 +++ 10 files changed, 3127 insertions(+), 40 deletions(-) create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.c create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.h create mode 100644 drivers/gpu/drm/i915/i915_perf.c -- 2.10.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v8 04/12] drm/i915: return EACCES for check_cmd() failures
check_cmd() is checking whether a command adheres to certain restrictions that ensure it's safe to execute within a privileged batch buffer. Returning false implies a privilege problem, not that the command is invalid. The distinction makes the difference between allowing the buffer to be executed as an unprivileged batch buffer or returning an EINVAL error to userspace without executing anything. In a case where userspace may want to test whether it can successfully write to a register that needs privileges the distinction may be important and an EINVAL error may be considered fatal. In particular this is currently true for Mesa, which includes a test for whether OACONTROL can be written too, but Mesa treats any error when flushing a batch buffer as fatal, calling exit(1). As it is currently Mesa can gracefully handle a failure to write to OACONTROL if the command parser is disabled, but if we were to remove OACONTROL from the parser's whitelist then the returned EINVAL would break Mesa applications as they attempt an OACONTROL write. This bumps the command parser version from 7 to 8, as the change is visible to userspace. Signed-off-by: Robert Bragg Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_cmd_parser.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index fe34470..c45dd83 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -1272,7 +1272,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, if (!check_cmd(engine, desc, cmd, length, is_master, &oacontrol_set)) { - ret = -EINVAL; + ret = -EACCES; break; } @@ -1333,6 +1333,9 @@ int i915_cmd_parser_get_version(struct drm_i915_private *dev_priv) * 5. GPGPU dispatch compute indirect registers. * 6. TIMESTAMP register and Haswell CS GPR registers * 7. Allow MI_LOAD_REGISTER_REG between whitelisted registers. +* 8. Don't report cmd_check() failures as EINVAL errors to userspace; +*rely on the HW to NOOP disallowed commands as it would without +*the parser enabled. */ - return 7; + return 8; } -- 2.10.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v8 02/12] drm/i915: Add i915 perf infrastructure
Adds base i915 perf infrastructure for Gen performance metrics. This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64 properties to configure a stream of metrics and returns a new fd usable with standard VFS system calls including read() to read typed and sized records; ioctl() to enable or disable capture and poll() to wait for data. A stream is opened something like: uint64_t properties[] = { /* Single context sampling */ DRM_I915_PERF_PROP_CTX_HANDLE,ctx_handle, /* Include OA reports in samples */ DRM_I915_PERF_PROP_SAMPLE_OA, true, /* OA unit configuration */ DRM_I915_PERF_PROP_OA_METRICS_SET,metrics_set_id, DRM_I915_PERF_PROP_OA_FORMAT, report_format, DRM_I915_PERF_PROP_OA_EXPONENT, period_exponent, }; struct drm_i915_perf_open_param parm = { .flags = I915_PERF_FLAG_FD_CLOEXEC | I915_PERF_FLAG_FD_NONBLOCK | I915_PERF_FLAG_DISABLED, .properties_ptr = (uint64_t)properties, .num_properties = sizeof(properties) / 16, }; int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, ¶m); Records read all start with a common { type, size } header with DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records contain an extensible number of fields and it's the DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that determine what's included in every sample. No specific streams are supported yet so any attempt to open a stream will return an error. v2: use i915_gem_context_get() - Chris Wilson v3: update read() interface to avoid passing state struct - Chris Wilson fix some rebase fallout, with i915-perf init/deinit v4: s/DRM_IORW/DRM_IOW/ - Emil Velikov Signed-off-by: Robert Bragg --- drivers/gpu/drm/i915/Makefile| 3 + drivers/gpu/drm/i915/i915_drv.c | 4 + drivers/gpu/drm/i915/i915_drv.h | 91 drivers/gpu/drm/i915/i915_perf.c | 443 +++ include/uapi/drm/i915_drm.h | 67 ++ 5 files changed, 608 insertions(+) create mode 100644 drivers/gpu/drm/i915/i915_perf.c diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 6123400..8d4e25f 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -113,6 +113,9 @@ i915-$(CONFIG_DRM_I915_CAPTURE_ERROR) += i915_gpu_error.o # virtual gpu code i915-y += i915_vgpu.o +# perf code +i915-y += i915_perf.o + ifeq ($(CONFIG_DRM_I915_GVT),y) i915-y += intel_gvt.o include $(src)/gvt/Makefile diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index af3559d..685c96e 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -836,6 +836,8 @@ static int i915_driver_init_early(struct drm_i915_private *dev_priv, intel_detect_preproduction_hw(dev_priv); + i915_perf_init(dev_priv); + return 0; err_workqueues: @@ -849,6 +851,7 @@ static int i915_driver_init_early(struct drm_i915_private *dev_priv, */ static void i915_driver_cleanup_early(struct drm_i915_private *dev_priv) { + i915_perf_fini(dev_priv); i915_gem_load_cleanup(&dev_priv->drm); i915_workqueues_cleanup(dev_priv); } @@ -2556,6 +2559,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = { DRM_IOCTL_DEF_DRV(I915_GEM_USERPTR, i915_gem_userptr_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM, i915_gem_context_getparam_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_SETPARAM, i915_gem_context_setparam_ioctl, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(I915_PERF_OPEN, i915_perf_open_ioctl, DRM_RENDER_ALLOW), }; static struct drm_driver driver = { diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 5a260db..7a65c0b 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1767,6 +1767,84 @@ struct intel_wm_config { bool sprites_scaled; }; +struct i915_perf_stream; + +struct i915_perf_stream_ops { + /* Enables the collection of HW samples, either in response to +* I915_PERF_IOCTL_ENABLE or implicitly called when stream is +* opened without I915_PERF_FLAG_DISABLED. +*/ + void (*enable)(struct i915_perf_stream *stream); + + /* Disables the collection of HW samples, either in response to +* I915_PERF_IOCTL_DISABLE or implicitly called before +* destroying the stream. +*/ + void (*disable)(struct i915_perf_stream *stream); + + /* Return: true if any i915 perf records are ready to read() +* for this stream. +*/ + bool (*can_read)(struct i915_perf_stream *stream); + + /* Call poll_wait, passing a wait queue that will be woken +* once there is something ready to read() for the stream +*/ + void (*poll_wait)(struct i915_pe
[Intel-gfx] [PATCH v8 00/12] Enable i915 perf stream for Haswell OA unit
Rebased on nightly, and updated as per review from Matt and Chris The first patch from Chris adds an i915_gem_context_pin_legacy() utility that I'm depending on now - though it doesn't really form part of the i915-perf series proper. I'm assuming Chris plans to send a version of this to the list himself with a proper commit message. - Robert Chris Wilson (1): ctx-pin placeholder from chris Robert Bragg (11): drm/i915: Add i915 perf infrastructure drm/i915: rename OACONTROL GEN7_OACONTROL drm/i915: return EACCES for check_cmd() failures drm/i915: don't whitelist oacontrol in cmd parser drm/i915: Add 'render basic' Haswell OA unit config drm/i915: Enable i915 perf stream for Haswell OA unit drm/i915: advertise available metrics via sysfs drm/i915: Add dev.i915.perf_stream_paranoid sysctl option drm/i915: add oa_event_min_timer_exponent sysctl drm/i915: Add more Haswell OA metric sets drm/i915: Add a kerneldoc summary for i915_perf.c drivers/gpu/drm/i915/Makefile |4 + drivers/gpu/drm/i915/gvt/handlers.c |2 +- drivers/gpu/drm/i915/i915_cmd_parser.c | 45 +- drivers/gpu/drm/i915/i915_drv.c |9 + drivers/gpu/drm/i915/i915_drv.h | 157 +++ drivers/gpu/drm/i915/i915_gem_context.c | 34 +- drivers/gpu/drm/i915/i915_oa_hsw.c | 752 ++ drivers/gpu/drm/i915/i915_oa_hsw.h | 38 + drivers/gpu/drm/i915/i915_perf.c| 1726 +++ drivers/gpu/drm/i915/i915_reg.h | 340 +- include/uapi/drm/i915_drm.h | 134 +++ 11 files changed, 3190 insertions(+), 51 deletions(-) create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.c create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.h create mode 100644 drivers/gpu/drm/i915/i915_perf.c -- 2.10.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v8 01/12] ctx-pin placeholder from chris
From: Chris Wilson --- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_gem_context.c | 34 ++--- 2 files changed, 24 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 55afb66..5a260db 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -3437,6 +3437,7 @@ struct drm_i915_gem_object * i915_gem_alloc_context_obj(struct drm_device *dev, size_t size); struct i915_gem_context * i915_gem_context_create_gvt(struct drm_device *dev); +struct i915_vma *i915_gem_context_pin_legacy(struct i915_gem_context *ctx); static inline struct i915_gem_context * i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32 id) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 5dca32a..a620e15b 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -751,12 +751,31 @@ needs_pd_load_post(struct i915_hw_ppgtt *ppgtt, return false; } +struct i915_vma *i915_gem_context_pin_legacy(struct i915_gem_context *ctx) +{ + struct i915_vma *vma = ctx->engine[RCS].state; + int ret; + + /* Clear this page out of any CPU caches for coherent swap-in/out. */ + if (!(vma->flags & I915_VMA_GLOBAL_BIND)) { + ret = i915_gem_object_set_to_gtt_domain(vma->obj, false); + if (ret) + return ERR_PTR(ret); + } + + ret = i915_vma_pin(vma, 0, ctx->ggtt_alignment, PIN_GLOBAL); + if (ret) + return ERR_PTR(ret); + + return vma; +} + static int do_rcs_switch(struct drm_i915_gem_request *req) { struct i915_gem_context *to = req->ctx; struct intel_engine_cs *engine = req->engine; struct i915_hw_ppgtt *ppgtt = to->ppgtt ?: req->i915->mm.aliasing_ppgtt; - struct i915_vma *vma = to->engine[RCS].state; + struct i915_vma *vma; struct i915_gem_context *from; u32 hw_flags; int ret, i; @@ -764,17 +783,10 @@ static int do_rcs_switch(struct drm_i915_gem_request *req) if (skip_rcs_switch(ppgtt, engine, to)) return 0; - /* Clear this page out of any CPU caches for coherent swap-in/out. */ - if (!(vma->flags & I915_VMA_GLOBAL_BIND)) { - ret = i915_gem_object_set_to_gtt_domain(vma->obj, false); - if (ret) - return ret; - } - /* Trying to pin first makes error handling easier. */ - ret = i915_vma_pin(vma, 0, to->ggtt_alignment, PIN_GLOBAL); - if (ret) - return ret; + vma = i915_gem_context_pin_legacy(to); + if (IS_ERR(vma)) + return PTR_ERR(vma); /* * Pin can switch back to the default context if we end up calling into -- 2.10.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx