On Wed, Apr 1, 2015 at 7:09 AM, Roland Scheidegger <srol...@vmware.com> wrote: > Am 01.04.2015 um 03:44 schrieb Rob Clark: >> On Tue, Mar 31, 2015 at 9:03 PM, Roland Scheidegger <srol...@vmware.com> >> wrote: >>> Am 01.04.2015 um 00:57 schrieb Rob Clark: >>>> From: Rob Clark <robcl...@freedesktop.org> >>>> >>>> Based on the algo from NV50LegalizeSSA::handleDIV() and handleMOD(). >>>> See also trans_idiv() in freedreno/ir3/ir3_compiler.c (which was an >>>> adaptation of the nv50 code from Ilia). >>>> >>>> Just sending as an rfc right now, since I'm not quite at the point to be >>>> able to test it on actual hw. >>>> >>>> Signed-off-by: Rob Clark <robcl...@freedesktop.org> >>>> --- >>>> src/glsl/Makefile.sources | 1 + >>>> src/glsl/nir/nir.h | 1 + >>>> src/glsl/nir/nir_lower_idiv.c | 212 >>>> ++++++++++++++++++++++++++++++++++++++++++ >>>> 3 files changed, 214 insertions(+) >>>> create mode 100644 src/glsl/nir/nir_lower_idiv.c >>>> >>>> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources >>>> index 18fff38..e426970 100644 >>>> --- a/src/glsl/Makefile.sources >>>> +++ b/src/glsl/Makefile.sources >>>> @@ -32,6 +32,7 @@ NIR_FILES = \ >>>> nir/nir_lower_atomics.c \ >>>> nir/nir_lower_global_vars_to_local.c \ >>>> nir/nir_lower_locals_to_regs.c \ >>>> + nir/nir_lower_idiv.c \ >>>> nir/nir_lower_io.c \ >>>> nir/nir_lower_phis_to_scalar.c \ >>>> nir/nir_lower_samplers.cpp \ >>>> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h >>>> index cd03d6b..e002d6f 100644 >>>> --- a/src/glsl/nir/nir.h >>>> +++ b/src/glsl/nir/nir.h >>>> @@ -1605,6 +1605,7 @@ void nir_lower_samplers(nir_shader *shader, >>>> >>>> void nir_lower_system_values(nir_shader *shader); >>>> void nir_lower_tex_projector(nir_shader *shader); >>>> +void nir_lower_idiv(nir_shader *shader); >>>> >>>> void nir_lower_atomics(nir_shader *shader); >>>> void nir_lower_to_source_mods(nir_shader *shader); >>>> diff --git a/src/glsl/nir/nir_lower_idiv.c b/src/glsl/nir/nir_lower_idiv.c >>>> new file mode 100644 >>>> index 0000000..e95c57e >>>> --- /dev/null >>>> +++ b/src/glsl/nir/nir_lower_idiv.c >>>> @@ -0,0 +1,212 @@ >>>> +/* >>>> + * Copyright © 2015 Red Hat >>>> + * >>>> + * Permission is hereby granted, free of charge, to any person obtaining a >>>> + * copy of this software and associated documentation files (the >>>> "Software"), >>>> + * to deal in the Software without restriction, including without >>>> limitation >>>> + * the rights to use, copy, modify, merge, publish, distribute, >>>> sublicense, >>>> + * and/or sell copies of the Software, and to permit persons to whom the >>>> + * Software is furnished to do so, subject to the following conditions: >>>> + * >>>> + * The above copyright notice and this permission notice (including the >>>> next >>>> + * paragraph) shall be included in all copies or substantial portions of >>>> the >>>> + * Software. >>>> + * >>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, >>>> EXPRESS OR >>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF >>>> MERCHANTABILITY, >>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT >>>> SHALL >>>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR >>>> OTHER >>>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING >>>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER >>>> DEALINGS >>>> + * IN THE SOFTWARE. >>>> + * >>>> + * Authors: >>>> + * Rob Clark <robcl...@freedesktop.org> >>>> + */ >>>> + >>>> +#include "nir.h" >>>> +#include "nir_builder.h" >>>> + >>>> +/* Lowers idiv/udiv/umod >>>> + * Based on NV50LegalizeSSA::handleDIV() >>>> + * >>>> + * Note that this is probably not enough precision for compute shaders. >>>> + * Perhaps we want a second higher precision (looping) version of this? >>>> + * Or perhaps we assume if you can do compute shaders you can also >>>> + * branch out to a pre-optimized shader library routine.. >>> >>> So if this is not enough precision, maybe should state how large the >>> error can be? >>> >> >> tbh, if I knew what the error for this approach was, I would have >> included it. I'm not the original author, but this is based on >> nouveau codegen code (as mentioned in the comment). I guess it is >> better than converting to float and dividing and converting back, but >> worse than an iterative (ie. looping, ie. divergent flow control) >> approach. It is apparently enough to keep piglit happy. >> >> The original algo in nv50 lowering code is from >> 322bc7ed68ed92233c97168c036d0aa50c11a20e (ie. 'nv50/ir: import nv50 >> target') which doesn't really give more clue about the origin.. >> >> if anyone knows, I'm all ears and will add relevant links/info to comment.. > > Ah ok. Well it isn't even obvious to me if the results are not actually > always exact.
Should be easy enough to take the algo, express it in terms of e.g. numpy (or even, *gasp*, a C program), and then do a randomized search over the 32bit x 32bit input space to see if there are any errors, and what they are. (Since the full input space would take too long...) Looks like I did just that when debugging the freedreno impl... available at http://hastebin.com/ewimuvobin.py -ilia _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev