On Sep 14, 2005, at 9:50 PM, Andrew Pinski wrote:
On Sep 14, 2005, at 9:21 PM, Dale Johannesen wrote:
Consider the following SSE code
(-march=pentium4 -mtune=prescott -O2 -mfpmath=sse -msse2)
<4256776a.c>

The first inner loop compiles to

        paddq   %xmm0, %xmm1

Good.  The second compiles to

        movdqa  %xmm2, %xmm0
        paddw   %xmm1, %xmm0
        movdqa  %xmm0, %xmm1

when it could be using a single paddw.  The basic problem is that
our approach defines __m128i to be V2DI even though all the operations
on the object are V4SI, so there are a lot of subreg's that don't need
to generate code. I'd like to fix this, but am not sure how to go about it.

From real looks of this looks more like a register allocation issue and
nothing to do with subregs at all, except subregs being there.

That's kind of an overstatement; obviously getting rid of the subregs would solve the problem as you can see from the first function. I think you're right that

If we allocated 64 and 63 as the same register, it would have worked correctly.

(you mean 64 and 66) would fix this example; I'll look at that. Having a more uniform representation for operations on __m128i objects would simplify things
all over the place, though.

Reply via email to