Am 01.04.2015 um 15:50 schrieb Ilia Mirkin: > On Wed, Apr 1, 2015 at 7:09 AM, Roland Scheidegger <srol...@vmware.com> wrote: >> Am 01.04.2015 um 03:44 schrieb Rob Clark: >>> On Tue, Mar 31, 2015 at 9:03 PM, Roland Scheidegger <srol...@vmware.com> >>> wrote: >>>> Am 01.04.2015 um 00:57 schrieb Rob Clark: >>>>> From: Rob Clark <robcl...@freedesktop.org> >>>>> >>>>> Based on the algo from NV50LegalizeSSA::handleDIV() and handleMOD(). >>>>> See also trans_idiv() in freedreno/ir3/ir3_compiler.c (which was an >>>>> adaptation of the nv50 code from Ilia). >>>>> >>>>> Just sending as an rfc right now, since I'm not quite at the point to be >>>>> able to test it on actual hw. >>>>> >>>>> Signed-off-by: Rob Clark <robcl...@freedesktop.org> >>>>> --- >>>>> src/glsl/Makefile.sources | 1 + >>>>> src/glsl/nir/nir.h | 1 + >>>>> src/glsl/nir/nir_lower_idiv.c | 212 >>>>> ++++++++++++++++++++++++++++++++++++++++++ >>>>> 3 files changed, 214 insertions(+) >>>>> create mode 100644 src/glsl/nir/nir_lower_idiv.c >>>>> >>>>> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources >>>>> index 18fff38..e426970 100644 >>>>> --- a/src/glsl/Makefile.sources >>>>> +++ b/src/glsl/Makefile.sources >>>>> @@ -32,6 +32,7 @@ NIR_FILES = \ >>>>> nir/nir_lower_atomics.c \ >>>>> nir/nir_lower_global_vars_to_local.c \ >>>>> nir/nir_lower_locals_to_regs.c \ >>>>> + nir/nir_lower_idiv.c \ >>>>> nir/nir_lower_io.c \ >>>>> nir/nir_lower_phis_to_scalar.c \ >>>>> nir/nir_lower_samplers.cpp \ >>>>> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h >>>>> index cd03d6b..e002d6f 100644 >>>>> --- a/src/glsl/nir/nir.h >>>>> +++ b/src/glsl/nir/nir.h >>>>> @@ -1605,6 +1605,7 @@ void nir_lower_samplers(nir_shader *shader, >>>>> >>>>> void nir_lower_system_values(nir_shader *shader); >>>>> void nir_lower_tex_projector(nir_shader *shader); >>>>> +void nir_lower_idiv(nir_shader *shader); >>>>> >>>>> void nir_lower_atomics(nir_shader *shader); >>>>> void nir_lower_to_source_mods(nir_shader *shader); >>>>> diff --git a/src/glsl/nir/nir_lower_idiv.c b/src/glsl/nir/nir_lower_idiv.c >>>>> new file mode 100644 >>>>> index 0000000..e95c57e >>>>> --- /dev/null >>>>> +++ b/src/glsl/nir/nir_lower_idiv.c >>>>> @@ -0,0 +1,212 @@ >>>>> +/* >>>>> + * Copyright © 2015 Red Hat >>>>> + * >>>>> + * Permission is hereby granted, free of charge, to any person obtaining >>>>> a >>>>> + * copy of this software and associated documentation files (the >>>>> "Software"), >>>>> + * to deal in the Software without restriction, including without >>>>> limitation >>>>> + * the rights to use, copy, modify, merge, publish, distribute, >>>>> sublicense, >>>>> + * and/or sell copies of the Software, and to permit persons to whom the >>>>> + * Software is furnished to do so, subject to the following conditions: >>>>> + * >>>>> + * The above copyright notice and this permission notice (including the >>>>> next >>>>> + * paragraph) shall be included in all copies or substantial portions of >>>>> the >>>>> + * Software. >>>>> + * >>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, >>>>> EXPRESS OR >>>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF >>>>> MERCHANTABILITY, >>>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT >>>>> SHALL >>>>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR >>>>> OTHER >>>>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, >>>>> ARISING >>>>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER >>>>> DEALINGS >>>>> + * IN THE SOFTWARE. >>>>> + * >>>>> + * Authors: >>>>> + * Rob Clark <robcl...@freedesktop.org> >>>>> + */ >>>>> + >>>>> +#include "nir.h" >>>>> +#include "nir_builder.h" >>>>> + >>>>> +/* Lowers idiv/udiv/umod >>>>> + * Based on NV50LegalizeSSA::handleDIV() >>>>> + * >>>>> + * Note that this is probably not enough precision for compute shaders. >>>>> + * Perhaps we want a second higher precision (looping) version of this? >>>>> + * Or perhaps we assume if you can do compute shaders you can also >>>>> + * branch out to a pre-optimized shader library routine.. >>>> >>>> So if this is not enough precision, maybe should state how large the >>>> error can be? >>>> >>> >>> tbh, if I knew what the error for this approach was, I would have >>> included it. I'm not the original author, but this is based on >>> nouveau codegen code (as mentioned in the comment). I guess it is >>> better than converting to float and dividing and converting back, but >>> worse than an iterative (ie. looping, ie. divergent flow control) >>> approach. It is apparently enough to keep piglit happy. >>> >>> The original algo in nv50 lowering code is from >>> 322bc7ed68ed92233c97168c036d0aa50c11a20e (ie. 'nv50/ir: import nv50 >>> target') which doesn't really give more clue about the origin.. >>> >>> if anyone knows, I'm all ears and will add relevant links/info to comment.. >> >> Ah ok. Well it isn't even obvious to me if the results are not actually >> always exact. > > Should be easy enough to take the algo, express it in terms of e.g. > numpy (or even, *gasp*, a C program), and then do a randomized search > over the 32bit x 32bit input space to see if there are any errors, and > what they are. (Since the full input space would take too long...) > > Looks like I did just that when debugging the freedreno impl... > available at > https://urldefense.proofpoint.com/v2/url?u=http-3A__hastebin.com_ewimuvobin.py&d=AwIFaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=Pn5cwt36o82bdZbB71-Fk0rkg7gYy4gsfb7UdU_yJOw&s=lmUwJATGP5llzxaPiaPMTL49paZCc6o-aTCWyaulGMw&e= > > Though actually, it uses frcp, which is usually not exact on gpus (unless you translate that to a fdiv if your gpu has one). So, potentially this could rely on the exact implementation of frcp...
Roland _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev