https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121908
Bug ID: 121908 Summary: Hot loop in xz not vectorized Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rdapp at gcc dot gnu.org CC: jeffreyalaw at gmail dot com, rguenth at gcc dot gnu.org, tamar.christina at arm dot com Target Milestone: --- The following is a simplified example of bt_skip_func in 557.xz that can be vectorized. It's a search loop that looks for (dis)similarities in an array and, depending on the input, we're seeing double-digit improvements when vectorized. #define uint8_t unsigned char #define uint32_t unsigned int int foo (const uint8_t *const cur, uint32_t n) { uint32_t i = 15; while (i++ != n) if (cur[i] != cur[i - 15]) break; return i; } We give up analyzing the DRs because n may be < 15: Creating dr for *_2 analyze_innermost: bla2.c:15:12: missed: failed: evolution of base is not affine. base_address: offset from base address: constant offset from base address: step: base alignment: 0 base misalignment: 0 offset alignment: 0 step alignment: 0 base_object: *_2 Creating dr for *_6 analyze_innermost: bla2.c:15:22: missed: failed: evolution of base is not affine. base_address: offset from base address: constant offset from base address: step: base alignment: 0 base misalignment: 0 offset alignment: 0 step alignment: 0 base_object: *_6 A more complex example, closer to the real loop is: #define uint32_t unsigned int #define uint8_t unsigned char int foo (const uint8_t *const cur, uint32_t len, uint32_t len_limit, uint32_t pos, uint32_t cur_match) { const uint32_t delta = pos - cur_match; const uint8_t *pb = cur - delta; while (++len != len_limit) if (pb[len] != cur[len]) break; return len; } My idea was to "version"/partition the loop or vectorization along len > len_limit and len <= len_limit.