Re: [Aarch64][PATCH] Improve Logical And Immediate Expressions

Christophe Lyon Wed, 23 Nov 2016 10:44:30 -0800

On 23 November 2016 at 17:30, Michael Collison <michael.colli...@arm.com> wrote:
> Hi Christophe,
>
> This is not a regression per se; the patch causes the test case to generate 
> one less instruction overall, but one additional 'and'. Trunk before the 
> patch (-O2):
>
> foo:
>         and     w0, w0, 255
>         lsl     w1, w0, 20
>         orr     w0, w1, w0, lsl 8
>         mov     w1, 65024
>         movk    w1, 0xfe0, lsl 16
>         and     w0, w0, w1
>         ret
>
> The new trunk aarch64 compiler with the patch generates fewer instructions 
> but one additional 'and'
>
> foo:
>         and     w0, w0, 255
>         lsl     w1, w0, 20
>         orr     w0, w1, w0, lsl 8
>         and     x0, x0, 268434944
>         and     x0, x0, -2031617
>         ret
>


OK,
by "regression" I meant that a test used to pass and now fails.
It looks like you have to update it to take into account the new, better code.

Christophe

> -----Original Message-----
> From: Christophe Lyon [mailto:christophe.l...@linaro.org]
> Sent: Wednesday, November 23, 2016 6:42 AM
> To: James Greenhalgh
> Cc: Michael Collison; gcc-patches@gcc.gnu.org; nd
> Subject: Re: [Aarch64][PATCH] Improve Logical And Immediate Expressions
>
> Hi Michael,
>
>
> On 21 November 2016 at 10:52, James Greenhalgh <james.greenha...@arm.com> 
> wrote:
>> On Fri, Nov 18, 2016 at 07:41:58AM +0000, Michael Collison wrote:
>>> James,
>>>
>>> I incorporated all your suggestions, and successfully bootstrapped
>>> and re-ran the testsuite.
>>>
>>> Okay for trunk?
>>>
>>> 2016-11-18  Michael Collison  <michael.colli...@arm.com>
>>>
>>>       * config/aarch64/aarch64-protos.h
>>>       (aarch64_and_split_imm1, aarch64_and_split_imm2)
>>>       (aarch64_and_bitmask_imm): New prototypes
>>>       * config/aarch64/aarch64.c (aarch64_and_split_imm1):
>>>       New overloaded function to create bit mask covering the
>>>       lowest to highest bits set.
>>>       (aarch64_and_split_imm2): New overloaded functions to create bit
>>>       mask of zeros between first and last bit set.
>>>       (aarch64_and_bitmask_imm): New function to determine if a integer
>>>       is a valid two instruction "and" operation.
>>>       * config/aarch64/aarch64.md:(and<mode>3): New define_insn and _split
>>>       allowing wider range of constants with "and" operations.
>>>       * (ior<mode>3, xor<mode>3): Use new LOGICAL2 iterator to prevent
>>>       "and" operator from matching restricted constant range used for
>>>       ior and xor operators.
>>>       * config/aarch64/constraints.md (UsO constraint): New SImode 
>>> constraint
>>>       for constants in "and" operantions.
>>>       (UsP constraint): New DImode constraint for constants in "and" 
>>> operations.
>>>       * config/aarch64/iterators.md (lconst2): New mode iterator.
>>>       (LOGICAL2): New code iterator.
>>>       * config/aarch64/predicates.md (aarch64_logical_and_immediate): New
>>>       predicate
>>>       (aarch64_logical_and_operand): New predicate allowing extended 
>>> constants
>>>       for "and" operations.
>>>       * testsuite/gcc.target/aarch64/and_const.c: New test to verify
>>>       additional constants are recognized and fewer instructions generated.
>>>       * testsuite/gcc.target/aarch64/and_const2.c: New test to verify
>>>       additional constants are recognized and fewer instructions generated.
>>>
>>
>>> diff --git a/gcc/config/aarch64/aarch64-protos.h
>>> b/gcc/config/aarch64/aarch64-protos.h
>>> index 3cdd69b..7ef8cdf 100644
>>> --- a/gcc/config/aarch64/aarch64-protos.h
>>> +++ b/gcc/config/aarch64/aarch64-protos.h
>>> @@ -300,6 +300,9 @@ extern struct tune_params aarch64_tune_params;
>>> HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned,
>>> unsigned);  int aarch64_get_condition_code (rtx);  bool
>>> aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
>>> +unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT
>>> +val_in); unsigned HOST_WIDE_INT aarch64_and_split_imm2
>>> +(HOST_WIDE_INT val_in); bool aarch64_and_bitmask_imm (unsigned
>>> +HOST_WIDE_INT val_in, machine_mode mode);
>>>  int aarch64_branch_cost (bool, bool);  enum aarch64_symbol_type
>>> aarch64_classify_symbolic_expression (rtx);  bool
>>> aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT); diff --git
>>> a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index
>>> 3e663eb..8e33c40 100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -3600,6 +3600,43 @@ aarch64_bitmask_imm (HOST_WIDE_INT val_in, 
>>> machine_mode mode)
>>>    return val == mask * bitmask_imm_mul[__builtin_clz (bits) - 26];
>>> }
>>>
>>> +/* Create mask of ones, covering the lowest to highest bits set in
>>> +VAL_IN.  */
>>> +
>>> +unsigned HOST_WIDE_INT
>>> +aarch64_and_split_imm1 (HOST_WIDE_INT val_in) {
>>> +  int lowest_bit_set = ctz_hwi (val_in);
>>> +  int highest_bit_set = floor_log2 (val_in);
>>> +  gcc_assert (val_in != 0);
>>
>> The comment above the function should make this precondition clear. Or
>> you could pick a well defined behaviour for when val_in == 0 (return
>> 0?), if that fits the rest of the algorithm.
>>
>> Otherwise, this patch looks OK to me.
>>
>> Thanks,
>> James
>>
>
> This patch (r242739) causes a regression on aarch64:
>
>   gcc.dg/zero_bits_compound-2.c scan-assembler-times \\(and: 2 now fails.
>
> Christophe.

Re: [Aarch64][PATCH] Improve Logical And Immediate Expressions

Reply via email to