On 6/27/23 22:15, Juzhe-Zhong wrote:
Consider the following complicate case:
#define TEST_TYPE(TYPE1, TYPE2)                                                \
   __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                         
\
     TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,     
\
     TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,          
\
     TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n)                         
\
   {                                                                            
\
     for (int i = 0; i < n; i++)                                                
\
       {                                                                        
\
        dst[i] = (TYPE1) a[i] * (TYPE1) b[i];                                  \
        dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];                                \
        dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];                                \
        dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];                                \
       }                                                                        
\
   }

TEST_TYPE (double, float)

Such complicate situation, Combine PASS can not combine extension of both 
operands on the fly.
So the combine PASS will first try to combine one of the combine extension, and 
then combine
the other. The combine flow is as follows:

Original IR:
(set (reg 0) (float_extend: (reg 1))
(set (reg 3) (float_extend: (reg 2))
(set (reg 4) (mult: (reg 0) (reg 3))

First step of combine:
(set (reg 3) (float_extend: (reg 2))
(set (reg 4) (mult: (float_extend: (reg 1) (reg 3))

Second step of combine:
(set (reg 4) (mult: (float_extend: (reg 1) (float_extend: (reg 2))

So, to enhance the combine optimization, we add a "pseudo vwfmul.wv" RTL 
pattern in autovec-opt.md
which is (set (reg 0) (mult (float_extend (reg 1) (reg 2)))).
Hmm, something doesn't make sense here. Combine knows how to do a 3->1 combination. I would expect to see the first step fail (substituting just one operand), then a later step try to combine all three instructions, substituting the extension for both input operands.

Can you pass along the .combine dump from the failing case?

Jeff

Reply via email to