> > > >   _Bool iftmp.0_113;
> > > >   _Bool iftmp.0_114;
> > > >   iftmp.0_113 = .MASK_LOAD (_170, 8B, _169, _171(D));
> > > >   iftmp.0_114 = _47 | iftmp.0_113;

> >   _BoolD.2746 _47;
> >   iftmp.0_114 = _47 ? 1 : iftmp.0_113;
> > which is folded into
> >   iftmp.0_114 = _47 | iftmp.0_113;

>
> _47 was the .MASK_LOAD def, right?

_47 is the inverted load mask, iftmp.0_113 is the MASK_LOAD result.
Its mask is _169 where _169 = ~_47;

> It's not exactly obvious what goes wrong - the transform above
> is correct - it's only "unexpected" for the lanes that were
> masked.  So the actual bug must be downstream from iftmp.0_144.
>
> I think one can try to reason on the ifcvt (scalar) code by
> assuming the .MASK_LOAD def would be undefined.  Then we'd
> have _47(D) ? 1 : iftmp.0_133 -> _47(D) | iftmp.0_133, I think
> that's at most phishy as the COND_EXPR has a well-defined
> value while the IOR might spill "undefined" elsewhere causing
> divergence.  Is that what is actually happening?

After vectorization we recognize the mask (_47) as degenerate,
i.e. all ones and, conversely, the masked load mask (_169) is all zeros.
So we shouldn't really load anything.

Optimized we have

  vect_patt_384.36_436 = .MASK_LEN_GATHER_LOAD (_435, vect__47.35_432, 1, { 0, 
... }, { 0, ... }, _471, 0);
  vect_iftmp.37_439 = vect_patt_384.36_436 | { 1, ... };

We then re-use a non-zero vector register as masked load result.  Its
stale values cause the wrong result (which should be 1 everywhere).

-- 
Regards
 Robin

Reply via email to