On Mon, Aug 07, 2017 at 02:56:09PM +0100, Sudi Das wrote:
> 
> Hi Richard
> 
> I have updated the patch according to your comments. Thanks for pointing it
> out and sorry for the delay!

Hi Sudi,

I've taken a look at your patch - at a high level, I think that adding
aarch64_output_simd_general_immediate seems like an over-complication, and
I'm not sure I follow why we need some of the logic you add. I think we
might be able to really simplify this patch, by adding a new character to
aarch64_print_operand for an (optionally) shifted immmediate to bic/orr.

> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index beff28e..5ae73c7 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -308,6 +308,15 @@ enum aarch64_parse_opt_result
>    AARCH64_PARSE_INVALID_ARG          /* Invalid arch, tune, cpu arg.  */
>  };
>  
> +/* Enum to distinguish which type of check is to be done in
> +   aarch64_simd_valid_immediate.  This is used as a bitmask where CHECK_ALL
> +   has both bits set.  Adding new types would require changes accordingly.  
> */
> +enum simd_immediate_check {
> +  CHECK_I   = 1,     /* Perform only non-inverted immediate checks (ORR).  */
> +  CHECK_NI  = 2,     /* Perform only inverted immediate checks (BIC).  */

Are these comments the right way round? If so, why is I non-inverted and
NI inverted - those names seem back to front to me.

These should probably all be AARCH64_CHECK_* rather than just CHECK_*

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 055ebaf..2cce5af 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -13029,6 +13041,47 @@ aarch64_output_simd_mov_immediate (rtx const_vector,
>    return templ;
>  }
>  
> +/* This function is similar to aarch64_output_simd_mov_immediate, used for
> +   immediate versions of 'bic' or 'orr'.  */
> +char*
> +aarch64_output_simd_general_immediate (rtx const_vector,
> +                                    machine_mode mode,
> +                                    unsigned width,

You can drop this parameter - it is always GET_MODE_BITSIZE (mode) so just
calculate that in the function body.

> +                                    const char *mnemonic)
> +{
> +  bool is_valid;
> +  static char templ[40];
> +  unsigned int lane_count = 0;
> +  char element_char;
> +
> +  struct simd_immediate_info info = { NULL_RTX, 0, 0, false, false };
> +
> +  if (strcmp (mnemonic, "orr") == 0)

For all the difference it makes, I'd just pass a bool and save save
yourself the short string comparison.

> +    is_valid = aarch64_simd_valid_immediate (const_vector, mode, false,
> +                                          &info, CHECK_I);
> +  else
> +    is_valid = aarch64_simd_valid_immediate (const_vector, mode, false,
> +                                          &info, CHECK_NI);
> +
> +  gcc_assert (is_valid);
> +  gcc_assert (CONST_INT_P (info.value));
> +  element_char = sizetochar (info.element_width);
> +  lane_count = width / info.element_width;

Width is always GET_MODE_BITSIZE (mode), and mode is one of
V8QI V16QI V4HI V8HI V2SI V4SI V2DI - so width is either 64 or 128.
in the CHECK_I and CHECK_NI cases of aarch64_simd_valid_immediate the elsize
(used to set info.element_width) is only ever 16 or 32. That means lane count
is either going to be 2, 4 or 8...

> +  if (lane_count == 1)
> +    sprintf (templ, "%s\t%%d0, #" HOST_WIDE_INT_PRINT_DEC,
> +          mnemonic, UINTVAL (info.value));

Which means this case never fires -- Which is a good thing, as my copy of
the Arm Architecture Reference Manual doesn't have scalar forms of these
instructions. Simplifying this down to two cases is what makes me think we
might be able to get away with doing this in aarch64_print_operand.

> +  else if (info.shift)
> +    sprintf (templ, "%s\t%%0.%d%c, #" HOST_WIDE_INT_PRINT_DEC
> +          ", %s #%d", mnemonic, lane_count, element_char,
> +          UINTVAL (info.value), "lsl", info.shift);
> +  else
> +    sprintf (templ, "%s\t%%0.%d%c, #" HOST_WIDE_INT_PRINT_DEC,
> +          mnemonic, lane_count, element_char, UINTVAL (info.value));

Here we need to know if we're creating a shifted immediate or not, that might
be the part that makes doing it in aarch64_print_operand hard.

Could you try it out, and see how the code ends up looking?

> diff --git a/gcc/testsuite/gcc.target/aarch64/bic_imm_1.c 
> b/gcc/testsuite/gcc.target/aarch64/bic_imm_1.c
> new file mode 100644
> index 0000000..d94dd90
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/bic_imm_1.c
> @@ -0,0 +1,26 @@
> +/* { dg-options "-O2 -ftree-vectorize" } */
> +
> +void
> +bic_s (short *a)
> +{
> +  for (int i = 0; i < 1024; i++)
> +    a[i] &= ~(0xff);
> +}
> +
> +void
> +bic_ss (short *a)
> +{
> +  for (int i = 0; i < 1024; i++)
> +    a[i] &= ~(0xff00);
> +}
> +
> +void
> +bic_int (int *a)
> +{
> +  for (int i = 0; i < 1024; i++)
> +    a[i] &= ~(0xff);
> +}

I'd like to see some more test cases, testing more of the shifted immediate
forms you support.

Additionally, I'd like these to be assemble tests, so we check we're
creating valid instructions.

Thanks,
James

> +/* { dg-final { scan-assembler "bic\\tv\[0-9\]+.8h, #255" } } */
> +/* { dg-final { scan-assembler "bic\\tv\[0-9\]+.8h, #255, lsl #8" } } */
> +/* { dg-final { scan-assembler "bic\\tv\[0-9\]+.4s, #255" } } */

Reply via email to