I have 16 vectorial registers in the machine R16-R31 which all have 128
cells of 16 bits each. These support ALU operations and load/stores just
as normal registers, but in one clock. So an

add R16 R17 R18

will add the whole R17 array with R18 (corresponding cells) and place the
result in R16. The 'where' instruction places a mask on the array so the
operation is done only where a certain condition is met. In the example in
the previous e-mail, where `a` is less than `b`. I've read the description
of doloop and I don't think I can use it in this case. I'll have to dig
more or settle with -O0 and cry.

Is it possible to abstract out such pieces of code in the input program in an
independent function whose prologue and epilogue have the necessary setting?

Just curious.

Uday.

Reply via email to