On Wed, Apr 1, 2015 at 9:50 AM, Ilia Mirkin <imir...@alum.mit.edu> wrote: > On Wed, Apr 1, 2015 at 7:09 AM, Roland Scheidegger <srol...@vmware.com> wrote: >> Am 01.04.2015 um 03:44 schrieb Rob Clark: >>> On Tue, Mar 31, 2015 at 9:03 PM, Roland Scheidegger <srol...@vmware.com> >>> wrote: >>>> Am 01.04.2015 um 00:57 schrieb Rob Clark: >>>>> >>>>> +/* Lowers idiv/udiv/umod >>>>> + * Based on NV50LegalizeSSA::handleDIV() >>>>> + * >>>>> + * Note that this is probably not enough precision for compute shaders. >>>>> + * Perhaps we want a second higher precision (looping) version of this? >>>>> + * Or perhaps we assume if you can do compute shaders you can also >>>>> + * branch out to a pre-optimized shader library routine.. >>>> >>>> So if this is not enough precision, maybe should state how large the >>>> error can be? >>>> >>> >>> tbh, if I knew what the error for this approach was, I would have >>> included it. I'm not the original author, but this is based on >>> nouveau codegen code (as mentioned in the comment). I guess it is >>> better than converting to float and dividing and converting back, but >>> worse than an iterative (ie. looping, ie. divergent flow control) >>> approach. It is apparently enough to keep piglit happy. >>> >>> The original algo in nv50 lowering code is from >>> 322bc7ed68ed92233c97168c036d0aa50c11a20e (ie. 'nv50/ir: import nv50 >>> target') which doesn't really give more clue about the origin.. >>> >>> if anyone knows, I'm all ears and will add relevant links/info to comment.. >> >> Ah ok. Well it isn't even obvious to me if the results are not actually >> always exact. > > Should be easy enough to take the algo, express it in terms of e.g. > numpy (or even, *gasp*, a C program), and then do a randomized search > over the 32bit x 32bit input space to see if there are any errors, and > what they are. (Since the full input space would take too long...) > > Looks like I did just that when debugging the freedreno impl... > available at http://hastebin.com/ewimuvobin.py >
fwiw, looks like you still had some broken hacks in that script, probably left overs from your earlier experiments.. I fixed it up (or at least it seems to be giving the same results piglit expects for the same inputs) and also added udiv vs idiv support.. guess I should add umod support too and commit it along side the idiv lowering (when that actually works too) would appreciate a second set of eyes on this since I'm pretty much a python and numpy newbie: http://hastebin.com/orogikadey.vhdl now to figure out what my idiv lowering is doing differently :-P BR, -R > -ilia _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev