On Tue, Dec 10, 2013 at 02:54:46AM +0400, Vadim Girlin wrote: > On Mon, 2013-12-09 at 10:56 -0500, Tom Stellard wrote: > > On Sat, Dec 07, 2013 at 07:06:36PM +0400, Vadim Girlin wrote: > > > On evergreen we have to reserve 1 stack element in some additional cases > > > besides the ones mentioned in the docs, but stack size computation was > > > recently reimplemented exactly as described in the docs by the patch that > > > added workarounds for stack issues on EG/CM, resulting in regressions > > > with some apps (Serious Sam 3). > > > > > > This patch fixes it by restoring previous behavior. > > > > > > Fixes https://bugs.freedesktop.org/show_bug.cgi?id=72369 > > > > > > Signed-off-by: Vadim Girlin <vadimgir...@gmail.com> > > > Cc: "10.0" <mesa-sta...@lists.freedesktop.org> > > > --- > > > src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 16 ++++++++++++---- > > > 1 file changed, 12 insertions(+), 4 deletions(-) > > > > > > diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp > > > b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp > > > index bc71cf8..355eb63 100644 > > > --- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp > > > +++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp > > > @@ -770,7 +770,6 @@ void bc_finalizer::update_ngpr(unsigned gpr) { > > > unsigned bc_finalizer::get_stack_depth(node *n, unsigned &loops, > > > unsigned &ifs, unsigned add) { > > > unsigned stack_elements = add; > > > - bool has_non_wqm_push_with_loops_on_stack = false; > > > bool has_non_wqm_push = (add != 0); > > > region_node *r = n->is_region() ? > > > static_cast<region_node*>(n) : n->get_parent_region(); > > > @@ -781,8 +780,6 @@ unsigned bc_finalizer::get_stack_depth(node *n, > > > unsigned &loops, > > > while (r) { > > > if (r->is_loop()) { > > > ++loops; > > > - if (has_non_wqm_push) > > > - has_non_wqm_push_with_loops_on_stack = true; > > > } else { > > > ++ifs; > > > has_non_wqm_push = true; > > > @@ -795,15 +792,26 @@ unsigned bc_finalizer::get_stack_depth(node *n, > > > unsigned &loops, > > > switch (ctx.hw_class) { > > > case HW_CLASS_R600: > > > case HW_CLASS_R700: > > > + // If any non-WQM push is invoked, 2 elements should be > > > reserved. > > > if (has_non_wqm_push) > > > stack_elements += 2; > > > break; > > > case HW_CLASS_CAYMAN: > > > + // If any stack operation is invoked, 2 elements should be > > > reserved > > > if (stack_elements) > > > stack_elements += 2; > > > break; > > > case HW_CLASS_EVERGREEN: > > > - if (has_non_wqm_push_with_loops_on_stack) > > > + // According to the docs we need to reserve 1 element for each > > > of the > > > + // following cases: > > > + // 1) non-WQM push is used with WQM/LOOP frames on stack > > > + // 2) ALU_ELSE_AFTER is used at the point of max stack usage > > > + // NOTE: > > > + // It was found that the conditions above are not sufficient, > > > there are > > > + // other cases where we also need to reserve stack space, > > > that's why > > > + // we always reserve 1 stack element if we have non-WQM push on > > > stack. > > > + // Condition 2 is ignored for now because we don't use this > > > instruction. > > > + if (has_non_wqm_push) > > > ++stack_elements; > > > > The kernel analyzer reports a stack size of 2 for compute shaders that > > have 3 levels of ALU_PUSH_BEFORE. This would suggest that you either need > > to > > reserve 2 sub-entries (stack_elements in the sb code) when there is a > > non-wqm push, or apply the CAYMAN rules to EVERGREEN. > > > > It is possible, though, that the kernel analyzer is over-allocating and > > this patch is correct, but I don't have any evidence for this yet. > > Is there any test that fails with this patch? AFAIK this algorithm > worked fine for about 8 months in both old and sb backends, so I'd > rather prefer to have any evidence that this is not correct before > increasing stack allocation and reducing performance. > > Vadim
I have not been able to find a failing test when reserving one extra element for the first non-wqm push (this includes a full run of the OpenCV OpenCL test-suite which has a lot of complex control flow), so I think the way you've implemented it in this patch is safe. >From analyzing dumps for the kernel analyzer, it looks like they count two extra sub-entries for the first non-wqm push when calculating the stack size, but they count only one extra sub-entry when determining whether or not to apply the work-around for the HW bug. This also makes me think that allocating one extra sub-entry is probably correct. -Tom > > > > > -Tom > > > > > > > break; > > > } > > > -- > > > 1.8.4.2 > > > > > > _______________________________________________ > > > mesa-dev mailing list > > > mesa-dev@lists.freedesktop.org > > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev