unpack operations (v2)

Paul Berry Wed, 23 Jan 2013 12:50:16 -0800

On 23 January 2013 12:44, Chad Versace <chad.vers...@linux.intel.com> wrote:


> On 01/22/2013 09:19 PM, Paul Berry wrote:
> > On 21 January 2013 00:49, Chad Versace <chad.vers...@linux.intel.com>
> wrote:
> >
> >> Lower them to arithmetic and bit manipulation expressions.
> >>
> >> v2:
> >>   - Rewrite using ir_builder. [for idr]
> >>   - In lowering packHalf2x16, don't truncate subnormal float16 values to
> >> zero.
> >>     And round to even rather than to zero.  [for stereotype441]
> >>
> >> CC: Ian Romanick <i...@freedesktop.org>
> >> CC: Paul Berry <stereotype...@gmail.com>
> >> Signed-off-by: Chad Versace <chad.vers...@linux.intel.com>
> >> ---
> >>  src/glsl/Makefile.sources           |    1 +
> >>  src/glsl/ir_optimization.h          |   20 +
> >>  src/glsl/lower_packing_builtins.cpp | 1043
> >> +++++++++++++++++++++++++++++++++++
> >>  3 files changed, 1064 insertions(+)
> >>  create mode 100644 src/glsl/lower_packing_builtins.cpp
>
>
>
> >> +   void
> >> +   setup_factory(void *mem_ctx)
> >> +   {
> >> +      assert(factory.mem_ctx == NULL);
> >> +      factory.mem_ctx = mem_ctx;
> >> +
> >> +      /* Avoid making a new list for each call to handle_rvalue().
> Make a
> >> +       * single list and reuse it.
> >> +       */
> >> +      if (factory.instructions == NULL) {
> >> +         factory.instructions = new(NULL) exec_list();
> >> +      } else {
> >> +         assert(factory.instructions->is_empty());
> >> +      }
> >> +   }
> >>
> >
> > Do we need factory.instructions to be heap-allocated?  How about just
> > making a private exec_list inside lower_packing_builtins_visitor and
> > setting factory.instructions to point to it in the
> > lower_packing_builtins_visitor constructor?
> >
> > (snip)
>
> That seems reasonable. It saves a new/delete pair on each call to
> lower_packing_builtins(). I'll add that change.
>
> I assume that this change is minimal enough that I don't need to
> repost the patch.
>

Agreed.


>
> >> +         /* Case 3) f32 lies in the range
> >> +          *         [min_norm16, max_norm16 + max_step16).
> >> +          *
> >> +          *   The resultant float16 will be either normal or infinite.
> >> +          *
> >> +          *   Solving
> >> +          *
> >> +          *     f32 = max_norm16 + max_step16           (40)
> >> +          *         = 2^15 * (1 + 1023 / 2^10) + 2^5    (41)
> >> +          *         = 2^16                              (42)
> >> +          *   gives
> >> +          *
> >> +          *     e32 = 142 and m32 = 0                   (43)
> >>
> >
> > I calculate this to be 143, not 142.
> >
> >
> >> +          *
> >> +          *   We already solved the boundary condition f32 = min_norm16
> >> above
> >> +          *   in equation 31. Therefore this case occurs if and only if
> >> +          *
> >> +          *     113 <= e32 and e32 < 142
> >>
> >
> > So this should be e32 < 143.
>
> Fixed.
>
> >
> >
> >> +          */
> >> +
> >> +         /* } else if (e32 < 142) { */
> >> +         if_tree(lequal(e, constant(142u << 23u)),
> >>
> >
> > Fortunately, since you use "lequal" here, you get the correct effect.
>
> And fixed the code here to match the fixed comments.
>
>
>          /* } else if (e32 < 143) { */
>          if_tree(less(e, constant(143u << 23u)),
>
>
> >> +         /* } else if (e16 < 31)) { */
> >> +         if_tree(less(e, constant(31u << 10u)),
> >> +
> >> +              /* u32 = ((e << 13) + (112 << 23))
> >> +               *     | (m << 13);
> >> +               */
> >> +              assign(u32, bit_or(add(lshift(e, constant(13u)),
> >> +                                     constant(112u << 23u)),
> >> +                                 lshift(m, constant(13u)))),
> >>
> >
> > I believe you can save one operation by factoring out the "<< 13" to get:
> >
> > assign(u32, lshift(bit_or(add(e, constant(112u << 10u)), m),
> >                    constant(13u)))
>
> Fixed.
>
>
> >
> > Well done!  This is a tour de force, Chad.  The only comment that I
> > consider blocking is the 142 vs 143 mix-up I noted above, and even that
> is
> > only in the comments.  With that fixed, this patch is:
> >
> > Reviewed-by: Paul Berry <stereotype...@gmail.com>
>
> Thanks for the thorough review!
>
>

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH (gles3) 15/20] glsl: Add lowering pass for GLSL ES 3.00 pack/unpack operations (v2)

Reply via email to