On Fri, Nov 16, 2018 at 10:34 AM Dylan Baker <dy...@pnwbakers.com> wrote: > > Quoting Roland Scheidegger (2018-11-13 18:41:00) > > Am 14.11.18 um 03:21 schrieb Matt Turner: > > > On Tue, Nov 13, 2018 at 6:03 PM Roland Scheidegger <srol...@vmware.com> > > > wrote: > > >> > > >> Am 13.11.18 um 23:49 schrieb Dylan Baker: > > >>> Quoting Roland Scheidegger (2018-11-13 14:13:00) > > >>>> Am 13.11.18 um 18:00 schrieb Dylan Baker: > > >>>>> Quoting Erik Faye-Lund (2018-11-13 01:34:53) > > >>>>>> On Mon, 2018-11-12 at 09:22 -0800, Dylan Baker wrote: > > >>>>>>> Quoting Erik Faye-Lund (2018-11-12 04:51:47) > > >>>>>>>> On Fri, 2018-11-09 at 10:40 -0800, Dylan Baker wrote: > > >>>>>>>>> Which has the same behavior. > > >>>>>>>> > > >>>>>>>> Does it? I'm not so sure... IROUND_POS seems to round to nearest > > >>>>>>>> integer depending on the FPU rounding mode, _mesa_roundevenf rounds > > >>>>>>>> to > > >>>>>>>> the nearest *even* value regardless of the FPU rounding mode, no? > > >>>>>>>> > > >>>>>>>> I'm not sure if it matters or not, but *at least* point that out in > > >>>>>>>> the > > >>>>>>>> commit message. Unless I'm missing something, of course... > > >>>>>>> > > >>>>>>> I should put it in the commit message, but there is a comment in > > >>>>>>> rounding.h that > > >>>>>>> if you change the rounding mode you get to keep the pieces. > > >>>>>> > > >>>>>> Well, this might regress performance pretty badly. Especially in the > > >>>>>> swrast code, this could be bad... > > >>>>>> > > >>>>> > > >>>>> Why? we have the assumption that you don't change the rounding mode > > >>>>> already in > > >>>>> core mesa and many of the drivers. > > >>>>> > > >>>>> For performance, I measured a simple 1000 loops of rounding, and > > >>>>> found that the > > >>>>> only way the rounding.h function was slower is if you used the > > >>>>> __SSE4_1__ > > >>>>> path... (It was the same performance as the int cast +0.5 > > >>>>> implementation) > > >>>> FWIW I'm not entirely sure it's useful to have a sse41 implementation - > > >>>> since all sse2 capable cpus can natively do rintf. Although maybe it > > >>>> should be pointed out that the sse41 implementation will use a defined > > >>>> rounding mode, whereas rintf will use current rounding mode. But I > > >>>> don't > > >>>> think anyone ever cares for the results if a different rounding mode > > >>>> would be set. Although of course rint and its variant do not actually > > >>>> guarantee the even part of it (but well if it's a sse41 capable box we > > >>>> pretty much know it would do just that anyway)... (And technically > > >>>> nearbyintf would probably be an even better solution, since we never > > >>>> want to get involved with the clunky exceptions, otherwise it's > > >>>> identical. But there might be reasons why it isn't used.) > > >>>> > > >>>> Roland > > >>> > > >>> I'm not convinced we want it either, since it seems to be slower than > > >>> glibc's > > >>> rintf. I guess it probably does make sense to use the nearbyintf > > >>> instead. > > >>> > > >>> As an aside (since I know 0 about assembly), does > > >>> _MM_FROUND_CUR_DIRECTION not > > >>> check the rounding mode? > > >> Oh indeed, I didn't check the code too closely (I was just assuming > > >> _mm_round_ss() was used because it is possible to use round-to-nearest > > >> regardless the actual rounding mode, but that's not the case). > > >> > > >> But actually I misread this code: the point of mesa_roundevenf is to > > >> round to float WITHOUT conversion to int. In which case it makes more > > >> sense at least at first look... > > >> > > >> But if you want to round to nearest integer WITH conversion to int, you > > >> probably really want to use something else. nearbyint family doesn't > > >> have variants which give you ints. There's rint functions which give you > > >> ints directly, but they are likely a very bad idea (aside from exception > > > > > > Why? > > Not sure what the why refers to here? > > > > > > > > > >> handling, not quite sure if this really causes the compiler to do > > >> something different) because of giving you long (or long long) results - > > >> meaning that you can't use the simple cpu instructions giving you 32bit > > >> results (because conversion to 64bit long + trunc to 32bit will give you > > >> defined (although meaningless) results in some cases where direct > > >> conversion to 32bit int wouldn't). > > >> So ideally you'd pick a variant where the compiler is smart enough to > > >> recognize it can be done with a single instruction. I would guess > > >> nearbyintf + int cast should do just about everywhere, at least as long > > >> as x64 or x86 + sse2 is used, my suspicion is the old IROUND function > > >> was done in a time where x87 was still relevant. Or maybe rintf + int > > >> cast, no idea how the compiler really handles them differently (I tried > > >> to quickly look at it in gcc source, but no idea where those are > > >> buried). As a side note, I hate it when the assembly solution is obvious > > >> and you can't really figure out how the hell you should coax the > > >> compiler in giving you the right answer (I mean, high level languages > > >> are there to help, not get in your way...). > > > > > > Please read the commit message of > > > > > > commit dd0d3a2c0fb388745519c8a3be800720541eccfe > > > Author: Matt Turner <matts...@gmail.com> > > > Date: Tue Mar 10 17:55:21 2015 -0700 > > > > > > mesa: Replace _mesa_round_to_even() with _mesa_roundeven(). > > > > > > for a lot of the background. > > > > > > I expect IROUND_POS can be replaced with the _mesa_lroundevenf function. > > > > > > > Yes, I missed that function. _mesa_lroundevenf is exactly what we want > > (although if someone cares about other archs, lrintf might generate > > pretty bad code there too). > > > > The commit message though isn't quite right there, saying "rint() and > > nearbyint() implement the round-half-to-even behavior we want when the > > rounding mode is set to the default round-to-nearest. The only > > difference between them is that nearbyint() raises the inexact > > exception." > > And seemingly uses that to justify using rintf and not nearbyintf for > > the fallback. It's the opposite, nearbyint never raises inexact > > exception, whereas rint does (depending on fp environment). Or maybe not > > (I suppose nearbyint could be more work because the mxcsr bits might > > have to be set explicitly to NOT cause an exception). > > > > Roland > > > > So we're agreeing that I should use _mesa_lroundevenf instead of > _mesa_roundeven?
I suspect that will be fine. But I guess using lroundf() might be a more accurate replacement, since the truncf(x + 0.5) that IROUND_POS does is the behavior of the round() family (round ties away from zero). _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev