On Tue, Dec 10, 2013 at 02:54:46AM +0400, Vadim Girlin wrote:
> On Mon, 2013-12-09 at 10:56 -0500, Tom Stellard wrote:
> > On Sat, Dec 07, 2013 at 07:06:36PM +0400, Vadim Girlin wrote:
> > > On evergreen we have to reserve 1 stack element in some additional cases
> > > besides the ones mentioned in the docs, but stack size computation was
> > > recently reimplemented exactly as described in the docs by the patch that
> > > added workarounds for stack issues on EG/CM, resulting in regressions
> > > with some apps (Serious Sam 3).
> > > 
> > > This patch fixes it by restoring previous behavior.
> > > 
> > > Fixes https://bugs.freedesktop.org/show_bug.cgi?id=72369
> > > 
> > > Signed-off-by: Vadim Girlin <vadimgir...@gmail.com>
> > > Cc: "10.0" <mesa-sta...@lists.freedesktop.org>
> > > ---
> > >  src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 16 ++++++++++++----
> > >  1 file changed, 12 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp 
> > > b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
> > > index bc71cf8..355eb63 100644
> > > --- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
> > > +++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
> > > @@ -770,7 +770,6 @@ void bc_finalizer::update_ngpr(unsigned gpr) {
> > >  unsigned bc_finalizer::get_stack_depth(node *n, unsigned &loops,
> > >                                             unsigned &ifs, unsigned add) {
> > >   unsigned stack_elements = add;
> > > - bool has_non_wqm_push_with_loops_on_stack = false;
> > >   bool has_non_wqm_push = (add != 0);
> > >   region_node *r = n->is_region() ?
> > >                   static_cast<region_node*>(n) : n->get_parent_region();
> > > @@ -781,8 +780,6 @@ unsigned bc_finalizer::get_stack_depth(node *n, 
> > > unsigned &loops,
> > >   while (r) {
> > >           if (r->is_loop()) {
> > >                   ++loops;
> > > -                 if (has_non_wqm_push)
> > > -                         has_non_wqm_push_with_loops_on_stack = true;
> > >           } else {
> > >                   ++ifs;
> > >                   has_non_wqm_push = true;
> > > @@ -795,15 +792,26 @@ unsigned bc_finalizer::get_stack_depth(node *n, 
> > > unsigned &loops,
> > >   switch (ctx.hw_class) {
> > >   case HW_CLASS_R600:
> > >   case HW_CLASS_R700:
> > > +         // If any non-WQM push is invoked, 2 elements should be 
> > > reserved.
> > >           if (has_non_wqm_push)
> > >                   stack_elements += 2;
> > >           break;
> > >   case HW_CLASS_CAYMAN:
> > > +         // If any stack operation is invoked, 2 elements should be 
> > > reserved
> > >           if (stack_elements)
> > >                   stack_elements += 2;
> > >           break;
> > >   case HW_CLASS_EVERGREEN:
> > > -         if (has_non_wqm_push_with_loops_on_stack)
> > > +         // According to the docs we need to reserve 1 element for each 
> > > of the
> > > +         // following cases:
> > > +         //   1) non-WQM push is used with WQM/LOOP frames on stack
> > > +         //   2) ALU_ELSE_AFTER is used at the point of max stack usage
> > > +         // NOTE:
> > > +         // It was found that the conditions above are not sufficient, 
> > > there are
> > > +         // other cases where we also need to reserve stack space, 
> > > that's why
> > > +         // we always reserve 1 stack element if we have non-WQM push on 
> > > stack.
> > > +         // Condition 2 is ignored for now because we don't use this 
> > > instruction.
> > > +         if (has_non_wqm_push)
> > >                   ++stack_elements;
> > 
> > The kernel analyzer reports a stack size of 2 for compute shaders that
> > have 3 levels of ALU_PUSH_BEFORE.  This would suggest that you either need 
> > to
> > reserve 2 sub-entries (stack_elements in the sb code) when there is a
> > non-wqm push, or apply the CAYMAN rules to EVERGREEN.
> > 
> > It is possible, though, that the kernel analyzer is over-allocating and
> > this patch is correct, but I don't have any evidence for this yet.
> 
> Is there any test that fails with this patch? AFAIK this algorithm
> worked fine for about 8 months in both old and sb backends, so I'd
> rather prefer to have any evidence that this is not correct before
> increasing stack allocation and reducing performance.
>
> Vadim

I have not been able to find a failing test when reserving one extra
element for the first non-wqm push (this includes a full run of the
OpenCV OpenCL test-suite which has a lot of complex control flow),
so I think the way you've implemented it in this patch is safe.

>From analyzing dumps for the kernel analyzer, it looks like they count
two extra sub-entries for the first non-wqm push when calculating the
stack size, but they count only one extra sub-entry when determining
whether or not to apply the work-around for the HW bug.  This also makes
me think that allocating one extra sub-entry is probably correct.

-Tom

> 
> > 
> > -Tom
> > 
> > 
> > >           break;
> > >   }
> > > -- 
> > > 1.8.4.2
> > > 
> > > _______________________________________________
> > > mesa-dev mailing list
> > > mesa-dev@lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> 
> 
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to