Hi Richard,

Thanks for all the help so far,

> > so I'd need 5 parameters and then I'm guessing the other expressions
> would be removed by DCE at some point?
> 
> Are you planning to make the FCMLA behaviour directly available as an
> internal function or provide a higher-level one that does a full complex
> multiply, with the target lowering that into individual instructions where
> necessary?

I was planning on doing it as one internal function and leave it up to the 
target
to expand it however it needs to.

> 
> Either way, each individual FCMLA should only need three scalar inputs.
> Like with FCADD, it doesn't matter whether the operands to the individual
> scalar FCMLAs are the ones (or the only ones) that determine the associated
> FCMLA scalar result.  All the node needs to do is describe something that
> would work when vectorised.

Ah yes that makes sense. I see what you mean.

> 
> What to do with the intermediate results you don't need is an interesting
> question :-).  Like you say, I was hoping DCE would get rid of them later.
> Does that not work?

I haven't tried it yet 😊 But I assume it'll work too.
I have complex add almost working, it generates the right code for the 
vectorized
loop. The loads are also corrected and the permute is gone and I update all the 
data references
for the two statements I replaced.

However for the scalar tail loop I have a problem since I only have vector
versions of the instructions, and the scalar loop is created from the same SLP 
tree.
So I end up with the builtins in the tail loop with nothing to expand them to 
and with
no way to differentiate between the two calls to the internal fn.

I would need to somehow undo this for the scalar part..

Kind Regards,
Tamar

> 
> Thanks,
> Richard

Reply via email to