> > > > _Bool iftmp.0_113; > > > > _Bool iftmp.0_114; > > > > iftmp.0_113 = .MASK_LOAD (_170, 8B, _169, _171(D)); > > > > iftmp.0_114 = _47 | iftmp.0_113;
> > _BoolD.2746 _47; > > iftmp.0_114 = _47 ? 1 : iftmp.0_113; > > which is folded into > > iftmp.0_114 = _47 | iftmp.0_113; > > _47 was the .MASK_LOAD def, right? _47 is the inverted load mask, iftmp.0_113 is the MASK_LOAD result. Its mask is _169 where _169 = ~_47; > It's not exactly obvious what goes wrong - the transform above > is correct - it's only "unexpected" for the lanes that were > masked. So the actual bug must be downstream from iftmp.0_144. > > I think one can try to reason on the ifcvt (scalar) code by > assuming the .MASK_LOAD def would be undefined. Then we'd > have _47(D) ? 1 : iftmp.0_133 -> _47(D) | iftmp.0_133, I think > that's at most phishy as the COND_EXPR has a well-defined > value while the IOR might spill "undefined" elsewhere causing > divergence. Is that what is actually happening? After vectorization we recognize the mask (_47) as degenerate, i.e. all ones and, conversely, the masked load mask (_169) is all zeros. So we shouldn't really load anything. Optimized we have vect_patt_384.36_436 = .MASK_LEN_GATHER_LOAD (_435, vect__47.35_432, 1, { 0, ... }, { 0, ... }, _471, 0); vect_iftmp.37_439 = vect_patt_384.36_436 | { 1, ... }; We then re-use a non-zero vector register as masked load result. Its stale values cause the wrong result (which should be 1 everywhere). -- Regards Robin