https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109156

--- Comment #2 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> (In reply to Tamar Christina from comment #0)
> > 2. It looks like all targets that implement SAD do so with an instruction
> > that does ABD and then perform a reduction.  So it looks like no target has
> > the semantics for SAD.
> 
> x86 for example does SAD on 16 QImode data and 4 SImode accumulators which
> means it sums 4 QImode absolute differences each (but SAD_EXPR leaves
> unspecified which, so SAD_EXPR is only usable when you in the end sum
> the accumulator lanes as well).
> 

Oh I see, psadbw is actually SAD. sorry I missed the `s` in the instruction!

> So I don't think 2. is true.
> 
> > So this brings up the question of why the detection wasn't done based on ABD
> > instead and leaving the reduction explicit in the vectorizer.
> > 
> > So question is, should we create a completely new standalone pattern for ABD
> > or should be make ABD the thing being detected and change SAD_EXPR to
> > recognize ADB + reduction.
> > 
> > Removing SAD completely in favor of ABD + reduction means that hand
> > optimized versions in targets need updating so I'm in favor of still
> > emitting SAD.
> 
> I'd do a separate internal function for ABD, possibly sharing part of the
> detection as you proposed.

Great, will do so, thanks!

Reply via email to