On Wed, Nov 13, 2013 at 09:05:48PM -0800, Kenneth Graunke wrote: > We need to start OA at the beginning of each batch where monitors are > active. OACONTROL isn't part of the hardware context, so to avoid > leaving counters enabled for other applications, we turn them off at the > end of the batch too. > > We also need to start them at BeginPerfMonitor time (unless they've > already been started). We stop them when the monitor last ends as well. > > Signed-off-by: Kenneth Graunke <kenn...@whitecape.org> > Cc: Eric Anholt <e...@anholt.net> > Cc: Carl Worth <cwo...@cworth.org> > Cc: Juha-Pekka Heikkilä <juha-pekka.heikk...@intel.com> > --- > src/mesa/drivers/dri/i965/brw_context.h | 2 + > .../drivers/dri/i965/brw_performance_monitor.c | 50 > ++++++++++++++++++++++ > src/mesa/drivers/dri/i965/intel_batchbuffer.c | 7 +++ > src/mesa/drivers/dri/i965/intel_batchbuffer.h | 3 +- > 4 files changed, 61 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_context.h > b/src/mesa/drivers/dri/i965/brw_context.h > index 77649cd..f395297 100644 > --- a/src/mesa/drivers/dri/i965/brw_context.h > +++ b/src/mesa/drivers/dri/i965/brw_context.h > @@ -1627,6 +1627,8 @@ bool brw_render_target_supported(struct brw_context > *brw, > /* brw_performance_monitor.c */ > void brw_init_performance_monitors(struct brw_context *brw); > void brw_dump_perf_monitors(struct brw_context *brw); > +void brw_perf_monitor_new_batch(struct brw_context *brw); > +void brw_perf_monitor_finish_batch(struct brw_context *brw); > > /* intel_extensions.c */ > extern void intelInitExtensions(struct gl_context *ctx); > diff --git a/src/mesa/drivers/dri/i965/brw_performance_monitor.c > b/src/mesa/drivers/dri/i965/brw_performance_monitor.c > index 85c395b..707deb2 100644 > --- a/src/mesa/drivers/dri/i965/brw_performance_monitor.c > +++ b/src/mesa/drivers/dri/i965/brw_performance_monitor.c > @@ -766,6 +766,14 @@ brw_begin_perf_monitor(struct gl_context *ctx, > drm_intel_bo_unmap(monitor->oa_bo); > #endif > > + /* If the OA counters aren't already on, enable them. */ > + if (brw->perfmon.oa_users == 0) { > + /* Ensure the OACONTROL enable and snapshot land in the same batch. > */ > + int space = (MI_REPORT_PERF_COUNT_BATCH_DWORDS + 3) * 4; > + intel_batchbuffer_require_space(brw, space, RENDER_RING); > + start_oa_counters(brw); > + } > + > /* Take a starting OA counter snapshot. */ > emit_mi_report_perf_count(brw, monitor->oa_bo, 0, REPORT_ID); > > @@ -801,6 +809,9 @@ brw_end_perf_monitor(struct gl_context *ctx, > SECOND_SNAPSHOT_OFFSET_IN_BYTES, REPORT_ID); > > --brw->perfmon.oa_users; > + > + if (brw->perfmon.oa_users == 0) > + stop_oa_counters(brw); > } > > if (monitor_needs_statistics_registers(brw, m)) { > @@ -925,6 +936,45 @@ brw_delete_perf_monitor(struct gl_context *ctx, struct > gl_perf_monitor_object *m > > > /******************************************************************************/ > > +/** > + * Called at the start of every render ring batch. > + * > + * Enable the OA counters if required. > + */ > +void > +brw_perf_monitor_new_batch(struct brw_context *brw) > +{ > + assert(brw->batch.ring == RENDER_RING); > + assert(brw->gen < 6 || brw->batch.used == 0); > + > + if (brw->perfmon.oa_users == 0) > + return; > + > + if (brw->gen >= 6) > + start_oa_counters(brw); > +}
Quick question for my understanding: Does this mean that at the end of each batch we're guaranteed that OA_CTL is again 0? Some of the considerations that lead to this: - OA counters are a pretty great information leak. Atm we don't care one bit due to lack of ppgtt, but once we have that and once we also have the paranoid CS checker we could enforce this in the kernel. - Usually perf counters requires some priviledges. Atm we can't enforce this (and don't care) but should we prep for this with a feature flag and an execbuf flag? The execbuf flag might help a bit implementing the next point in the kernel, but I don't think it's needed. Without flags plan B would be to noop-out OA/perfcounter stuff with the CS checker. - What about OA users on other rings? Iirc that stuff might be useful for opencl running on the VCS ring, too. Otoh we don't have that need right now, and the kernel could always just pass an OA token between rings and synchronize with semaphores. Without the execbuf flag we'd simply have to rely on the CS checker. Anyway just a few questions I wanted to throw out there for consideration, current approach looks fine to me. I've read through the entire pile of patches, so Acked-by: Daniel Vetter <daniel.vet...@ffwll.ch> on the series. Whatever that's worth ;-) Cheers, Daniel > + > +/** > + * Called at the end of every render ring batch. > + * > + * Disable the OA counters. > + * > + * This relies on there being enough space in BATCH_RESERVED. > + */ > +void > +brw_perf_monitor_finish_batch(struct brw_context *brw) > +{ > + assert(brw->batch.ring == RENDER_RING); > + > + if (brw->perfmon.oa_users == 0) > + return; > + > + if (brw->gen >= 6) > + stop_oa_counters(brw); > +} > + > +/******************************************************************************/ > + > void > brw_init_performance_monitors(struct brw_context *brw) > { > diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c > b/src/mesa/drivers/dri/i965/intel_batchbuffer.c > index b50b6c7..e3f826d 100644 > --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c > +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c > @@ -187,6 +187,9 @@ intel_batchbuffer_emit_render_ring_prelude(struct > brw_context *brw) > * what that batch contributed. Emit state packets to write them to a BO. > */ > brw_emit_query_begin(brw); > + > + /* We may also need to enable OA counters. */ > + brw_perf_monitor_new_batch(brw); > } > > /** > @@ -247,6 +250,10 @@ brw_finish_batch(struct brw_context *brw) > */ > brw_emit_query_end(brw); > > + /* We may also need to disable OA counters. */ > + if (brw->batch.ring == RENDER_RING) > + brw_perf_monitor_finish_batch(brw); > + > if (brw->curbe.curbe_bo) { > drm_intel_gem_bo_unmap_gtt(brw->curbe.curbe_bo); > drm_intel_bo_unreference(brw->curbe.curbe_bo); > diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h > b/src/mesa/drivers/dri/i965/intel_batchbuffer.h > index 38149f3..d14127e 100644 > --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h > +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h > @@ -19,8 +19,9 @@ extern "C" { > * - Optional MI_NOOP for ensuring the batch length is qword aligned (4 > bytes) > * - Any state emitted by vtbl->finish_batch(): > * - Gen4-5 record ending occlusion query values (4 * 4 = 16 bytes) > + * - Disabling OA counters on Gen6+ (3 DWords = 12 bytes) > */ > -#define BATCH_RESERVED 24 > +#define BATCH_RESERVED 36 > > struct intel_batchbuffer; > > -- > 1.8.3.2 > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev