Hi, I was looking at why, in the vectorized DCT kernel of FFmpeg, the insn selection of GCC fails to produce XOP fused-multiply-add vector insns: DOM is detecting a redundant expression that is optimized, and that makes it impossible to detect the higher level insns in combine.
The DCT kernel looks like this: static void dct_unquantize_h263_inter_c (DCTELEM * block, int qscale, int nCoeffs) { int i, level, qmul, qadd; qadd = (qscale - 1) | 1; qmul = qscale << 1; for (i = 0; i <= nCoeffs; i++) { level = block[i]; if (level < 0) level = level * qmul + qadd; else level = level * qmul - qadd; block[i] = level; } } The expression "level * qmul" is redundant and is optimized out of the condition: level = level * qmul; if (level < 0) level += qadd; else level -= qadd; On this code GCC fails to combine the + and the - with *, as they both depend on the same computation. However, if I am modifying the DCT kernel to artificially remove the redundancy: if (level < 0) level = level * qmul + qadd; else level = level * qadd - qmul; the kernel is vectorized with the expected insns: vpmacsdd %xmm1, %xmm6, %xmm0, %xmm3 vpmacsdd %xmm5, %xmm1, %xmm0, %xmm2 vpcomltd %xmm4, %xmm0, %xmm0 vpcmov %xmm0, %xmm2, %xmm3, %xmm0 Here is the slower and larger code generated for the original DCT, with one * and two +: vpmulld %xmm6, %xmm0, %xmm1 vpcomltd %xmm3, %xmm0, %xmm0 vpaddd %xmm5, %xmm1, %xmm2 vpaddd %xmm4, %xmm1, %xmm1 vpcmov %xmm0, %xmm1, %xmm2, %xmm0 Is there a simple way to teach combine how to introduce redundancy to generate higher level insns? Thanks, Sebastian Pop -- AMD / Open Source Compiler Engineering / GNU Tools