Richard Guenther <[email protected]> writes:
> On Thu, Mar 24, 2011 at 11:57 AM, Richard Sandiford
> <[email protected]> wrote:
>> Chung-Lin Tang <[email protected]> writes:
>>> PR48183 is a case where ARM NEON instrinsics, under -O -g, produce debug
>>> insns that tries to expand OImode (32-byte integer) zero constants, much
>>> too large to represent as two HOST_WIDE_INTs; as the internals manual
>>> indicates, such large constants are not supported in general, and ICEs
>>> on the GET_MODE_BITSIZE(mode) == 2*HOST_BITS_PER_WIDE_INT assertion.
>>>
>>> This patch allows the cases where the large integer constant is still
>>> representable using a single CONST_INT, such as zero(0). Bootstrapped
>>> and tested on i686 and x86_64, cross-tested on ARM, all without
>>> regressions. Okay for trunk?
>>>
>>> Thanks,
>>> Chung-Lin
>>>
>>> 2011-03-20 Chung-Lin Tang <[email protected]>
>>>
>>> * emit-rtl.c (immed_double_const): Allow wider than
>>> 2*HOST_BITS_PER_WIDE_INT mode constants when they are
>>> representable as a single const_int RTX.
>>
>> I realise this might be seen as a good expedient fix, but it makes
>> me a bit uneasy. Not a very constructive rationale, sorry.
>>
>> For this particular case, the problem is that vst2q_s32 and the
>> like initialise a union directly:
>>
>> union { int32x4x2_t __i; __builtin_neon_oi __o; } __bu = { __b; };
>>
>> and this gets translated into a zeroing of the whole union followed
>> by an assignment to __i:
>>
>> __bu = {};
>> __bu.__i = __b;
>
> Btw, this looks like a missed optimization in gimplification. Worth
> a bugreport (or even a fix). Might be a target but as well, dependent
> on how __builtin_neon_oi looks like. Do you have a complete testcase
> that reproduces the above with a cross?
Yeah, build cc1 for arm-linux-gnueabi and compile the attached
testcase (from Chung-Lin) using:
-O2 -g -mfpu=neon -mfloat-abi=softfp
Rchard
/* { dg-do compile } */
/* { dg-require-effective-target arm_neon_ok } */
/* { dg-options "-O -g" } */
/* { dg-add-options arm_neon } */
#include <arm_neon.h>
void move_16bit_to_32bit (int32_t *dst, const short *src, unsigned n)
{
unsigned i;
int16x4x2_t input;
int32x4x2_t mid;
int32x4x2_t output;
for (i = 0; i < n/2; i += 8) {
input = vld2_s16(src + i);
mid.val[0] = vmovl_s16(input.val[0]);
mid.val[1] = vmovl_s16(input.val[1]);
output.val[0] = vshlq_n_s32(mid.val[0], 8);
output.val[1] = vshlq_n_s32(mid.val[1], 8);
vst2q_s32((int32_t *)dst + i, output);
}
}