Hi Richard, Thanks for all the help so far,
> > so I'd need 5 parameters and then I'm guessing the other expressions > would be removed by DCE at some point? > > Are you planning to make the FCMLA behaviour directly available as an > internal function or provide a higher-level one that does a full complex > multiply, with the target lowering that into individual instructions where > necessary? I was planning on doing it as one internal function and leave it up to the target to expand it however it needs to. > > Either way, each individual FCMLA should only need three scalar inputs. > Like with FCADD, it doesn't matter whether the operands to the individual > scalar FCMLAs are the ones (or the only ones) that determine the associated > FCMLA scalar result. All the node needs to do is describe something that > would work when vectorised. Ah yes that makes sense. I see what you mean. > > What to do with the intermediate results you don't need is an interesting > question :-). Like you say, I was hoping DCE would get rid of them later. > Does that not work? I haven't tried it yet 😊 But I assume it'll work too. I have complex add almost working, it generates the right code for the vectorized loop. The loads are also corrected and the permute is gone and I update all the data references for the two statements I replaced. However for the scalar tail loop I have a problem since I only have vector versions of the instructions, and the scalar loop is created from the same SLP tree. So I end up with the builtins in the tail loop with nothing to expand them to and with no way to differentiate between the two calls to the internal fn. I would need to somehow undo this for the scalar part.. Kind Regards, Tamar > > Thanks, > Richard