On 10/28/2013 04:52 AM, Pohjolainen, Topi wrote: > On Mon, Oct 28, 2013 at 01:14:55PM +0200, Pohjolainen, Topi wrote: >> On Fri, Oct 25, 2013 at 10:35:48PM -0700, Kenneth Graunke wrote: >>> Implementing the GetTransformFeedbackVertexCount() driver hook allows >>> the VBO module to call us with the right number of vertices. >>> >>> The hardware doesn't directly count the number of vertices written by >>> SOL, so we instead use the SO_NUM_PRIMS_WRITTEN(n) counters and multiply >>> by the number of vertices per primitive. >>> >>> Unfortunately, counting the number of primitives generated is tricky: >>> a program might pause a transform feedback operation, start a second one >>> with a different object, then switch back and resume. Both transform >>> feedback operations share the SO_NUM_PRIMS_WRITTEN counters. >>> >>> To work around this, we save the counter values at Begin, Pause, Resume, >>> and End. This "bookends" each section where transform feedback is >>> active for the current object. Adding up differences of pairs gives >>> us the number of primitives generated. (This is similar to what we >>> do for occlusion queries on platforms without hardware contexts.) >>> >>> v2: Fix missing parenthesis in assertion (caught by Eric Anholt). >>> >>> Signed-off-by: Kenneth Graunke <kenn...@whitecape.org> >>> Reviewed-by: Ian Romanick <ian.d.roman...@intel.com> >>> Reviewed-by: Eric Anholt <e...@anholt.net> >>> --- >>> src/mesa/drivers/dri/i965/brw_context.c | 2 + >>> src/mesa/drivers/dri/i965/brw_context.h | 26 ++++ >>> src/mesa/drivers/dri/i965/gen6_sol.c | 1 + >>> src/mesa/drivers/dri/i965/gen7_sol_state.c | 190 >>> ++++++++++++++++++++++++++++- >>> 4 files changed, 218 insertions(+), 1 deletion(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_context.c >>> b/src/mesa/drivers/dri/i965/brw_context.c >>> index 623273c..f4e04b6 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_context.c >>> +++ b/src/mesa/drivers/dri/i965/brw_context.c >>> @@ -252,6 +252,8 @@ brw_init_driver_functions(struct brw_context *brw, >>> >>> functions->NewTransformFeedback = brw_new_transform_feedback; >>> functions->DeleteTransformFeedback = brw_delete_transform_feedback; >>> + functions->GetTransformFeedbackVertexCount = >>> + brw_get_transform_feedback_vertex_count; >>> if (brw->gen >= 7) { >>> functions->BeginTransformFeedback = gen7_begin_transform_feedback; >>> functions->EndTransformFeedback = gen7_end_transform_feedback; >>> diff --git a/src/mesa/drivers/dri/i965/brw_context.h >>> b/src/mesa/drivers/dri/i965/brw_context.h >>> index 48aa4c1..c72bad1 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_context.h >>> +++ b/src/mesa/drivers/dri/i965/brw_context.h >>> @@ -880,11 +880,33 @@ struct intel_batchbuffer { >>> } saved; >>> }; >>> >>> +#define BRW_MAX_XFB_STREAMS 4 >>> + >>> struct brw_transform_feedback_object { >>> struct gl_transform_feedback_object base; >>> >>> /** A buffer to hold SO_WRITE_OFFSET(n) values while paused. */ >>> drm_intel_bo *offset_bo; >>> + >>> + /** The most recent primitive mode (GL_TRIANGLES/GL_POINTS/GL_LINES). */ >>> + GLenum primitive_mode; >>> + >>> + /** >>> + * Count of primitives generated during this transform feedback >>> operation. >>> + * @{ >>> + */ >>> + uint64_t prims_generated[BRW_MAX_XFB_STREAMS]; >>> + drm_intel_bo *prim_count_bo; >>> + unsigned prim_count_buffer_index; /**< in number of uint64_t units */ >>> + /** @} */ >>> + >>> + /** >>> + * Number of vertices written between last Begin/EndTransformFeedback(). >>> + * >>> + * Used to implement DrawTransformFeedback(). >>> + */ >>> + uint64_t vertices_written[BRW_MAX_XFB_STREAMS]; >>> + bool vertices_written_valid; >>> }; >>> >>> /** >>> @@ -1574,6 +1596,10 @@ brw_begin_transform_feedback(struct gl_context *ctx, >>> GLenum mode, >>> void >>> brw_end_transform_feedback(struct gl_context *ctx, >>> struct gl_transform_feedback_object *obj); >>> +GLsizei >>> +brw_get_transform_feedback_vertex_count(struct gl_context *ctx, >>> + struct >>> gl_transform_feedback_object *obj, >>> + GLuint stream); >>> >>> /* gen7_sol_state.c */ >>> void >>> diff --git a/src/mesa/drivers/dri/i965/gen6_sol.c >>> b/src/mesa/drivers/dri/i965/gen6_sol.c >>> index 2e6c86a..af5bed9 100644 >>> --- a/src/mesa/drivers/dri/i965/gen6_sol.c >>> +++ b/src/mesa/drivers/dri/i965/gen6_sol.c >>> @@ -162,6 +162,7 @@ brw_delete_transform_feedback(struct gl_context *ctx, >>> } >>> >>> drm_intel_bo_unreference(brw_obj->offset_bo); >>> + drm_intel_bo_unreference(brw_obj->prim_count_bo); >>> >>> free(brw_obj); >>> } >>> diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c >>> b/src/mesa/drivers/dri/i965/gen7_sol_state.c >>> index 27421da..7cac8fe 100644 >>> --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c >>> +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c >>> @@ -249,14 +249,179 @@ const struct brw_tracked_state gen7_sol_state = { >>> .emit = upload_sol_state, >>> }; >>> >>> +/** >>> + * Tally the number of primitives generated so far. >>> + * >>> + * The buffer contains a series of pairs: >>> + * (<start0, start1, start2, start3>, <end0, end1, end2, end3>) ; >>> + * (<start0, start1, start2, start3>, <end0, end1, end2, end3>) ; >>> + * >>> + * For each stream, we subtract the pair of values (end - start) to get the >>> + * number of primitives generated during one section. We accumulate these >>> + * values, adding them up to get the total number of primitives generated. >>> + */ >>> +static void >>> +gen7_tally_prims_generated(struct brw_context *brw, >>> + struct brw_transform_feedback_object *obj) >>> +{ >>> + /* If the current batch is still contributing to the number of >>> primitives >>> + * generated, flush it now so the results will be present when mapped. >>> + */ >>> + if (drm_intel_bo_references(brw->batch.bo, obj->prim_count_bo)) >>> + intel_batchbuffer_flush(brw); >>> + >>> + if (unlikely(brw->perf_debug && drm_intel_bo_busy(obj->prim_count_bo))) >>> + perf_debug("Stalling for # of transform feedback primitives >>> written.\n"); >>> + >>> + drm_intel_bo_map(obj->prim_count_bo, false); >>> + uint64_t *prim_counts = obj->prim_count_bo->virtual; >>> + >>> + assert(obj->prim_count_buffer_index % (2 * BRW_MAX_XFB_STREAMS) == 0); >>> + int pairs = obj->prim_count_buffer_index / (2 * BRW_MAX_XFB_STREAMS); >>> + >>> + for (int i = 0; i < pairs; i++) { >>> + for (int s = 0; s < BRW_MAX_XFB_STREAMS; s++) { >>> + obj->prims_generated[s] += >>> + prim_counts[BRW_MAX_XFB_STREAMS + s] - prim_counts[s]; >>> + } >>> + prim_counts += 2 * BRW_MAX_XFB_STREAMS; /* move to the next pair */ >>> + } >>> + >>> + drm_intel_bo_unmap(obj->prim_count_bo); >>> + >>> + /* Release the BO; we've already tallied all the data it contained. */ >>> + drm_intel_bo_unreference(obj->prim_count_bo); >>> + obj->prim_count_bo = NULL; >>> +} >>> + >>> +/** >>> + * Store the SO_NUM_PRIMS_WRITTEN counters for each stream (4 uint64_t >>> values) >>> + * to prim_count_bo. >>> + * >>> + * If prim_count_bo is out of space, gather up the results so far into >>> + * prims_generated[] and allocate a new buffer with enough space. >>> + * >>> + * The number of primitives written is used to compute the number of >>> vertices >>> + * written to a transform feedback stream, which is required to implement >>> + * DrawTransformFeedback(). >>> + */ >>> +static void >>> +gen7_save_primitives_written_counters(struct brw_context *brw, >>> + struct brw_transform_feedback_object *obj) >>> +{ >>> + const int streams = BRW_MAX_XFB_STREAMS; >>> + >>> + /* Check if there's enough space for a new pair of four values. */ >>> + if (obj->prim_count_bo != NULL && >>> + obj->prim_count_buffer_index + 2 * streams >= 4096 / >>> sizeof(uint64_t)) { >>> + /* Gather up the results so far and release the BO. */ >>> + gen7_tally_prims_generated(brw, obj); >>> + } >>> + >>> + /* Allocate a new buffer if needed. A page should be plenty. */ >>> + if (obj->prim_count_bo == NULL) { >>> + obj->prim_count_buffer_index = 0; >>> + obj->prim_count_bo = >>> + drm_intel_bo_alloc(brw->bufmgr, "xfb primitive counts", 4096, >>> 4096); >> >> I was wondering why 'gen7_tally_prims_generated()' needs to dispose the >> buffer >> object and this logic here to reallocate another. Couldn't we re-use the old >> bo and simply let 'gen7_tally_prims_generated()' reset the >> 'prim_count_buffer_index' to zero? >> >> Below 'brw_compute_xfb_vertices_written()' wants to tally also but it guards >> itself from adding counters multiple times using the flag >> 'vertices_written_valid'. In fact if it didn't 'gen7_tally_prims_generated()' >> would call 'drm_intel_bo_map()' against NULL pointer, right? Or did I >> understand the logic all wrong? > > Further to the "20 questions", try to bear with me. I may easily be missing > things but would this patch work without the register storing and loading > introduced in patch four if pause() and resume() hooks saved the counters the > same way as begin() and end()? In other words, one would track the deltas > between begin()/pause(), pause()/resume() and resume()/end() in addition to > the > begin()/end() (if no pause()-resume() sequence were in between)?
No. Patch 4 saves and restores the SO_WRITE_OFFSET(n) registers, which is the offset at which the hardware will write data. It needs to be a legitimate offset for whatever buffer is currently bound, which changes with the XFB object. Otherwise the GPU will write off the end of a buffer and hang. This patch uses the SO_NUM_PRIMS_WRITTEN(n) registers, which just track the number of primitives generated. Since we're just counting things, deltas work fine. --Ken _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev