On Fri, Dec 18, 2015 at 1:55 AM, H.J. Lu <hjl.to...@gmail.com> wrote:
> On Thu, Dec 17, 2015 at 1:59 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
>> On Thu, Dec 17, 2015 at 1:21 PM, Uros Bizjak <ubiz...@gmail.com> wrote:
>>> On Thu, Dec 17, 2015 at 7:09 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
>>>> On Thu, Dec 17, 2015 at 8:11 AM, H.J. Lu <hjl.to...@gmail.com> wrote:
>>>>> On Thu, Dec 17, 2015 at 7:50 AM, H.J. Lu <hjl.to...@gmail.com> wrote:
>>>>>> On Thu, Dec 17, 2015 at 5:42 AM, Uros Bizjak <ubiz...@gmail.com> wrote:
>>>>>>> On Thu, Dec 17, 2015 at 2:00 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
>>>>>>>> On Thu, Dec 17, 2015 at 2:04 AM, Uros Bizjak <ubiz...@gmail.com> wrote:
>>>>>>>>> On Thu, Dec 17, 2015 at 12:29 AM, H.J. Lu <hongjiu...@intel.com> 
>>>>>>>>> wrote:
>>>>>>>>>> Since sibcall never returns, we can only use call-clobbered register
>>>>>>>>>> as GOT base.  Otherwise, callee-saved register used as GOT base won't
>>>>>>>>>> be properly restored.
>>>>>>>>>>
>>>>>>>>>> Tested on x86-64 with -m32.  OK for trunk?
>>>>>>>>>
>>>>>>>>> You don't have to add explicit clobber for members of "CLOBBERED_REGS"
>>>>>>>>> class, and register_no_elim_operand predicate should be used with "U"
>>>>>>>>> constraint. Also, please introduce new predicate, similar to how
>>>>>>>>> GOT_memory_operand is defined and handled.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Here is the updated patch.  There is a predicate already,
>>>>>>>> sibcall_memory_operand.  It allows any registers to
>>>>>>>> be as GOT base, which is the root of our problem.
>>>>>>>> This patch removes GOT slot from it and handles
>>>>>>>> sibcall over GOT slot with *sibcall_GOT_32 and
>>>>>>>> *sibcall_value_GOT_32 patterns.  Since I need to
>>>>>>>> expose constraints on GOT base register to RA,
>>>>>>>> I have to use 2 operands, GOT base and function
>>>>>>>> symbol, to describe sibcall over 32-bit GOT slot.
>>>>>>>
>>>>>>> Please use
>>>>>>>
>>>>>>>        (mem:SI (plus:SI
>>>>>>>              (match_operand:SI 0 "register_no_elim_operand" "U")
>>>>>>>              (match_operand:SI 1 "GOT32_symbol_operand")))
>>>>>>> ...
>>>>>>>
>>>>>>> to avoid manual rebuild of the operand.
>>>>>>>
>>>>>>
>>>>>> Is this OK?
>>>>>>
>>>>>
>>>>> An updated patch to allow sibcall_memory_operand for RTL
>>>>> expansion.  OK for trunk if there is no regression?
>>>>>
>>>>
>>>> There is no regressions on x86-64 with -m32.  OK for trunk?
>>>
>>> OK for mainline, with a following change:
>>>
>>> @@ -597,11 +597,17 @@
>>>          (match_operand 0 "memory_operand"))))
>>>
>>>  ;; Return true if OP is a memory operands that can be used in sibcalls.
>>> +;; Since sibcall never returns, we can only use call-clobbered register
>>> +;; as GOT base.  Allow GOT slot here only with pseudo register as GOT
>>> +;; base.  Properly handle sibcall over GOT slot with *sibcall_GOT_32
>>> +;; and *sibcall_value_GOT_32 patterns.
>>>  (define_predicate "sibcall_memory_operand"
>>>    (and (match_operand 0 "memory_operand")
>>>         (match_test "CONSTANT_P (XEXP (op, 0))
>>>              || (GET_CODE (XEXP (op, 0)) == PLUS
>>>              && REG_P (XEXP (XEXP (op, 0), 0))
>>> +            && (REGNO (XEXP (XEXP (op, 0), 0))
>>> +                >= FIRST_PSEUDO_REGISTER)
>>>              && GET_CODE (XEXP (XEXP (op, 0), 1)) == CONST
>>>              && GET_CODE (XEXP (XEXP (XEXP (op, 0), 1), 0)) == UNSPEC
>>>              && XINT (XEXP (XEXP (XEXP (op, 0), 1), 0), 1) == 
>>> UNSPEC_GOT)")))
>>>
>>> You can use (!HARD_REGISTER_NUM_P (...) || call_used_regs[...]) here.
>>> Call-used hard regs are still allowed here.
>>>
>>> Can you please also rewrite this horrible match_test as a block of C
>>> code using GOT32_symbol_operand predicate?
>>>
>>
>> I am retesting the patch with
>>
>> ;; Return true if OP is a memory operands that can be used in sibcalls.
>> ;; Since sibcall never returns, we can only use call-clobbered register
>> ;; as GOT base.  Allow GOT slot here only with pseudo register as GOT
>> ;; base.  Properly handle sibcall over GOT slot with *sibcall_GOT_32
>> ;; and *sibcall_value_GOT_32 patterns.
>> (define_predicate "sibcall_memory_operand"
>>   (match_operand 0 "memory_operand")
>> {
>>   op = XEXP (op, 0);
>>   if (CONSTANT_P (op))
>>     return true;
>>   if (GET_CODE (op) == PLUS && REG_P (XEXP (op, 0)))
>>     {
>>       int regno = REGNO (XEXP (op, 0));
>>       if (!HARD_REGISTER_NUM_P (regno) || call_used_regs[regno])
>>         {
>>           op = XEXP (op, 1);
>>           if (GOT32_symbol_operand (op, VOIDmode))
>>             return true;
>>         }
>>     }
>>   return false;
>> })
>>
>>
>> I will check it in if there is no regression.
>>
>
> There is no regression.  But I missed sibcall to local function
> with -O2 -fPIC -m32 -fno-plt -mregparm=3:
>
> extern void bar (int, int, int) __attribute__((visibility("hidden")));
>
> void
> foo (int a, int b, int c)
> {
>   bar (a, b, c);
>   bar (a, b, c);
> }
>
> It doesn't need GOT.  This patch fixes it.
>
> iff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 0e2bec3..691915f9 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -6657,6 +6657,7 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
>  {
>    tree type, decl_or_type;
>    rtx a, b;
> +  bool bind_global = decl && !targetm.binds_local_p (decl);
>
>    /* If we are generating position-independent code, we cannot sibcall
>       optimize direct calls to global functions, as the PLT requires
> @@ -6665,7 +6666,7 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
>        && !TARGET_64BIT
>        && flag_pic
>        && flag_plt
> -      && decl && !targetm.binds_local_p (decl))
> +      && bind_global)
>      return false;
>
>    /* If we need to align the outgoing stack, then sibcalling would
> @@ -6726,7 +6727,7 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
>   parameters.  Note that DLLIMPORT functions and call via GOT
>   slot are indirect.  */
>        if (!decl
> -  || (flag_pic && !flag_plt)
> +  || (bind_global && flag_pic && !flag_plt)
>    || (TARGET_DLLIMPORT_DECL_ATTRIBUTES && DECL_DLLIMPORT_P (decl)))
>   {
>    /* Check if regparm >= 3 since arg_reg_available is set to
>
> Here is the complete patch with a testcase.  OK for trunk?

LGTM, but please allow Jakub (CC'd) a couple of days for his eventual objection.

Otherwise, OK for mainline after this period.

Uros.

Reply via email to