https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #19 from GCC Commits ---
The master branch has been updated by Vineet Gupta :
https://gcc.gnu.org/g:b755c151fde4ad736405bb2e13a7de0420161179
commit r15-6673-gb755c151fde4ad736405bb2e13a7de0420161179
Author: Vineet Gupta
Date: Tu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #18 from Robin Dapp ---
> But the point here really here is we don't need the widening semantics, more
> twice. The min+max+sub in loops with a final reducing sum should do the
> trick.
OK I guess it can be argued that
minus (max
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #17 from Li Pan ---
(In reply to Vineet Gupta from comment #14)
> (In reply to Li Pan from comment #7)
> > Created attachment 59661 [details]
> > with usad pattern
>
> Can you please post the patch, lest we duplicate your effort.
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #16 from Vineet Gupta ---
(In reply to Robin Dapp from comment #15)
> (In reply to Vineet Gupta from comment #14)
> > @Robin, it seems the current codegen generates 2 widening ops, which might
> > not be as efficient. We have done s
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #15 from Robin Dapp ---
(In reply to Vineet Gupta from comment #14)
> (In reply to Li Pan from comment #7)
> > Created attachment 59661 [details]
> > with usad pattern
>
> Can you please post the patch, lest we duplicate your effort
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #14 from Vineet Gupta ---
(In reply to Li Pan from comment #7)
> Created attachment 59661 [details]
> with usad pattern
Can you please post the patch, lest we duplicate your effort.
It would be nice to test it on real hardware.
@Ro
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #13 from Robin Dapp ---
I don't fully understand yet :)
So the full-register moves are undesirable, I agree. When accumulating with a
widening op they seem unavoidable, though. The only alternative would be to
split out the extens
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #12 from Li Pan ---
(In reply to Robin Dapp from comment #11)
> (In reply to Li Pan from comment #9)
> > Created attachment 59663 [details]
> > before_vs_after when outer loop is 128
>
> Ok, that's a different loop then. I'm seeing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #11 from Robin Dapp ---
(In reply to Li Pan from comment #9)
> Created attachment 59663 [details]
> before_vs_after when outer loop is 128
Ok, that's a different loop then. I'm seeing vmv1rs in the current version, is
that what you
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
Jeffrey A. Law changed:
What|Removed |Added
CC||law at gcc dot gnu.org
--- Comment #10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #9 from Li Pan ---
Created attachment 59663
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59663&action=edit
before_vs_after when outer loop is 128
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #8 from Robin Dapp ---
So the difference is
(usad expansion)
vmax
vmin
vsub
vsext
vadd
vs (right now)
vwsub
vneg
vmax
vwadd
Why is that preferable?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #7 from Li Pan ---
Created attachment 59661
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59661&action=edit
with usad pattern
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #6 from Li Pan ---
Created attachment 59660
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59660&action=edit
upstream
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #5 from Robin Dapp ---
If it's better then OK. Can you show an example?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #4 from JuzheZhong ---
(In reply to Robin Dapp from comment #3)
> First, pixel_sad_4x4 is not very hot, 8x8 and 16x16 are.
>
> Second, we are vectorizing this, but with -mno-vector-strict-align.
>
> IMHO we don't need to synthesize
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #3 from Robin Dapp ---
First, pixel_sad_4x4 is not very hot, 8x8 and 16x16 are.
Second, we are vectorizing this, but with -mno-vector-strict-align.
IMHO we don't need to synthesize an usad pattern.
17 matches
Mail list logo