https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103797
--- Comment #4 from hubicka at kam dot mff.cuni.cz --- > -E and remove not needed code. > > > The > > declaratoins are quite convoluted, but the function is well isolated and > > easy to inspect from full one... > > Do we speak about: > https://github.com/mozilla/gecko-dev/blob/bd25b1ca76dd5d323ffc69557f6cf759ba76ba23/gfx/2d/FilterNodeSoftware.cpp#L3670-L3691 > ? Yes. > > It should be possible creating a synthetical test that does the same (and > lives > in a loop, right?). Well, I tried that for a while and got bit lost (either code got vectorized by both gcc and clang or by neither). There are more issues where we have over 50% regression wrt clang build at gfx code, so I think I will first try to reproduce those locally and perf them to see if there is more pattern here. The releavant code is: uint32_t mozilla::gfx::{anonymous}::SpecularLightingSoftware::LightPixel (struct SpecularLightingSoftware * const this, const struct Point3D & aNormal, const struct Point3D & aVectorToLight, uint32_t aColor) { <bb 2> [local count: 118111600]: _48 = MEM[(const struct BasePoint3D *)aVectorToLight_25(D)].D.75826.D.75829.z; _49 = _48 + 1.0e+0; _50 = MEM[(const struct BasePoint3D *)aVectorToLight_25(D)].D.75826.D.75829.y; _51 = _50 + 0.0; _52 = MEM[(const struct BasePoint3D *)aVectorToLight_25(D)].D.75826.D.75829.x; _53 = _52 + 0.0; _80 = _53 * _53; _82 = _51 * _51; _83 = _80 + _82; _85 = _49 * _49; _86 = _83 + _85; if (_86 u>= 0.0) goto <bb 3>; [99.95%] else goto <bb 4>; [0.05%] <bb 3> [local count: 118052545]: _87 = .SQRT (_86); goto <bb 5>; [100.00%] <bb 4> [local count: 59055]: _29 = __builtin_sqrtf (_86); <bb 5> [local count: 118111600]: # _30 = PHI <_29(4), _87(3)> _88 = _53 / _30; _89 = _51 / _30; _90 = _49 / _30; _41 = MEM[(const struct BasePoint3D *)aNormal_26(D)].D.75826.D.75829.x; _39 = _41 * _88; _37 = MEM[(const struct BasePoint3D *)aNormal_26(D)].D.75826.D.75829.y; _33 = _37 * _89; _27 = _33 + _39; _45 = MEM[(const struct BasePoint3D *)aNormal_26(D)].D.75826.D.75829.z; _46 = _45 * _90; _47 = _27 + _46; if (_47 >= 0.0) goto <bb 12>; [59.00%] else goto <bb 6>; [41.00%] With -Ofast it gets bit more streamlined: <bb 2> [local count: 118111600]: _48 = MEM[(const struct BasePoint3D *)aVectorToLight_25(D)].D.75826.D.75829.z; _49 = _48 + 1.0e+0; _50 = MEM[(const struct BasePoint3D *)aVectorToLight_25(D)].D.75826.D.75829.y; _51 = MEM[(const struct BasePoint3D *)aVectorToLight_25(D)].D.75826.D.75829.x; powmult_78 = _51 * _51; powmult_80 = _50 * _50; _81 = powmult_78 + powmult_80; powmult_83 = _49 * _49; _84 = _81 + powmult_83; _85 = __builtin_sqrtf (_84); _86 = _51 / _85; _87 = _50 / _85; _88 = _49 / _85; _41 = MEM[(const struct BasePoint3D *)aNormal_26(D)].D.75826.D.75829.x; _39 = _41 * _86; _37 = MEM[(const struct BasePoint3D *)aNormal_26(D)].D.75826.D.75829.y; _33 = _37 * _87; _27 = _33 + _39; _45 = MEM[(const struct BasePoint3D *)aNormal_26(D)].D.75826.D.75829.z; _46 = _45 * _88; _47 = _27 + _46; if (_47 >= 0.0) goto <bb 3>; [59.00%] else goto <bb 9>; [41.00%] But I do not quite see in the slp dump why this is not considered for vectorization. I attach the dump. Honza