> On Feb 6, 2025, at 09:25, Richard Biener <richard.guent...@gmail.com> wrote:
> 
> On Thu, Feb 6, 2025 at 2:57 PM Qing Zhao <qing.z...@oracle.com> wrote:
>> 
>> 
>> 
>>> On Feb 6, 2025, at 04:36, Richard Biener <richard.guent...@gmail.com> wrote:
>>> 
>>> On Wed, Feb 5, 2025 at 4:40 PM Qing Zhao <qing.z...@oracle.com> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Feb 5, 2025, at 07:46, Richard Biener <richard.guent...@gmail.com> 
>>>>> wrote:
>>>>> 
>>>>> On Tue, Feb 4, 2025 at 10:12 PM Qing Zhao <qing.z...@oracle.com> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> One of our big application segv in libgcc code while unwinding the stack.
>>>>>> 
>>>>>> This is a random crash while the application throws a c++ exception and
>>>>>> unwinds the stack. Incidents are random and never can be reproduced by 
>>>>>> any
>>>>>> test case.
>>>>>> 
>>>>>> The libgcc that is used is based on GCC 8.
>>>>>> 
>>>>>> Fortunately, we got some information through the stack trace, and 
>>>>>> narrowed
>>>>>> down the crash is in the routine "init_object" of 
>>>>>> libgcc/unwind-dw2-pde.c:
>>>>>> 
>>>>>> 777           count = classify_object_over_fdes (ob, ob->u.single);
>>>>>> 
>>>>>> when loading the 2nd parameter of the call to 
>>>>>> "classify_object_over_fdes".
>>>>>> i.e, the address that is pointed by "ob->u.single" is invalid.
>>>>>> 
>>>>>> And the outer caller we can track back is the following line 1066 of the 
>>>>>> routine
>>>>>> "_Unwind_Find_FDE":
>>>>>> 
>>>>>> 1060   /* Classify and search the objects we've not yet processed.  */
>>>>>> 1061   while ((ob = unseen_objects))
>>>>>> 1062     {
>>>>>> 1063       struct object **p;
>>>>>> 1064
>>>>>> 1065       unseen_objects = ob->next;
>>>>>> 1066       f = search_object (ob, pc);
>>>>>> 
>>>>>> Then we suspected that the construction of the static variable 
>>>>>> "unseen_objects"
>>>>>> might have some bug.
>>>>>> 
>>>>>> Then after studying the source code that constructs "unseen_objects" in
>>>>>> libgcc/unwind-dw2-pde.c, I found an issue in the routines 
>>>>>> "__register_frame"
>>>>>> and "__register_frame_table" as following:
>>>>>> 
>>>>>> 127 void
>>>>>> 128 __register_frame (void *begin)
>>>>>> 129 {
>>>>>> 130   struct object *ob;
>>>>>> 131
>>>>>> 132   /* If .eh_frame is empty, don't register at all.  */
>>>>>> 133   if (*(uword *) begin == 0)
>>>>>> 134     return;
>>>>>> 135
>>>>>> 136   ob = malloc (sizeof (struct object));
>>>>>> 137   __register_frame_info (begin, ob);
>>>>>> 138 }
>>>>>> 
>>>>>> 181 void
>>>>>> 182 __register_frame_table (void *begin)
>>>>>> 183 {
>>>>>> 184   struct object *ob = malloc (sizeof (struct object));
>>>>>> 185   __register_frame_info_table (begin, ob);
>>>>>> 186 }
>>>>>> 187
>>>>>> 
>>>>>> In the above, one obvious issue in the source code is at line 136, line 
>>>>>> 137,
>>>>>> and line 184 and line 185: the return value of call to "malloc" is not 
>>>>>> checked
>>>>>> against NULL before it was passed as the second parameter of the 
>>>>>> follwoing call.
>>>>>> 
>>>>>> This might cause unpredicted random behavior.
>>>>>> 
>>>>>> I checked the latest trunk GCC libgcc, the same issue is there too.
>>>>>> 
>>>>>> This patch is for the latest trunk GCC.
>>>>>> 
>>>>>> Please let me know any comments on this?
>>>>> 
>>>>> I think I've seen this elsewhere —
>>>> 
>>>> Do you mean that you saw this problem (malloc return NULL during 
>>>> register_frame) previously?
>>> 
>>> It was probably reported to us (SUSE) from a customer/partner, also
>>> from a (llvm?) JIT context.
>> 
>> Okay.
>> So, that bug has not been resolved yet?
> 
> It was resolved from our side IIRC as being a problem in the
> application with the JIT and
> otherwise a feature request (better unwind API for this use case,
> allowing a failure mode).

Okay, I see.

Thanks.

Qing
> 
> Richard.
> 
>> Qing
>>> 
>>>>> the issue is the unwind register API does
>>>>> not allow for failures but I also think calling abort() is bad.
>>>> 
>>>> Okay, I see.
>>>> 
>>>>> 
>>>>> Are you calling this from a JIT context or so?
>>>> 
>>>> I will ask the bug submitter on this to make sure.
>>>> 
>>>>> We're assuming that at program
>>>>> start malloc() will not fail.
>>>> 
>>>> So, the routine __register_frame is only called at the _start_ of a 
>>>> program?
>>>>> 
>>>>> The proper solution is probably to add an alternate ABI which has a way 
>>>>> to fail
>>>>> during registration.
>>>> 
>>>> Any suggestion on this (or any other routine I can refer to?)
>>>> 
>>>> 
>>>> Thanks a lot for the help.
>>>> 
>>>> Qing
>>>>> 
>>>>> Richard.
>>>>> 
>>>>>> thanks.
>>>>>> 
>>>>>> Qing
>>>>>> 
>>>>>> 
>>>>>> =======================
>>>>>> 
>>>>>> Fix two issues in libgcc/unwind-dw2-fde.c:
>>>>>> 
>>>>>> A. The returned value of call to malloc is not checked against NULL.
>>>>>> This might cause unpredicted behavior during stack unwinding.
>>>>>> 
>>>>>> B. Check for null begin parameter (as well as pointer to null) in
>>>>>> __register_frame_info_table_bases and __register_frame_table.
>>>>>> 
>>>>>> libgcc/ChangeLog:
>>>>>> 
>>>>>>      * unwind-dw2-fde.c (__register_frame): Check the return value of 
>>>>>> call
>>>>>>      to malloc.
>>>>>>      (__register_frame_info_table_bases): Check for null begin parameter.
>>>>>>      (__register_frame_table): Check the return value of call to malloc,
>>>>>>      Check for null begin parameter.
>>>>>> ---
>>>>>> libgcc/unwind-dw2-fde.c | 12 ++++++++++++
>>>>>> 1 file changed, 12 insertions(+)
>>>>>> 
>>>>>> diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
>>>>>> index cdfd3974c99..f0bc29d682a 100644
>>>>>> --- a/libgcc/unwind-dw2-fde.c
>>>>>> +++ b/libgcc/unwind-dw2-fde.c
>>>>>> @@ -169,6 +169,8 @@ __register_frame (void *begin)
>>>>>>   return;
>>>>>> 
>>>>>> ob = malloc (sizeof (struct object));
>>>>>> +  if (ob == NULL)
>>>>>> +    abort ();
>>>>>> __register_frame_info (begin, ob);
>>>>>> }
>>>>>> 
>>>>>> @@ -180,6 +182,10 @@ void
>>>>>> __register_frame_info_table_bases (void *begin, struct object *ob,
>>>>>>                                 void *tbase, void *dbase)
>>>>>> {
>>>>>> +  /* If .eh_frame is empty, don't register at all.  */
>>>>>> +  if ((const uword *) begin == 0 || *(const uword *) begin == 0)
>>>>>> +    return;
>>>>>> +
>>>>>> ob->pc_begin = (void *)-1;
>>>>>> ob->tbase = tbase;
>>>>>> ob->dbase = dbase;
>>>>>> @@ -210,7 +216,13 @@ __register_frame_info_table (void *begin, struct 
>>>>>> object *ob)
>>>>>> void
>>>>>> __register_frame_table (void *begin)
>>>>>> {
>>>>>> +    /* If .eh_frame is empty, don't register at all.  */
>>>>>> +  if (*(uword *) begin == 0)
>>>>>> +    return;
>>>>>> +
>>>>>> struct object *ob = malloc (sizeof (struct object));
>>>>>> +  if (ob == NULL)
>>>>>> +    abort ();
>>>>>> __register_frame_info_table (begin, ob);
>>>>>> }
>>>>>> 
>>>>>> --
>>>>>> 2.31.1


Reply via email to