> On Feb 6, 2025, at 09:25, Richard Biener <richard.guent...@gmail.com> wrote: > > On Thu, Feb 6, 2025 at 2:57 PM Qing Zhao <qing.z...@oracle.com> wrote: >> >> >> >>> On Feb 6, 2025, at 04:36, Richard Biener <richard.guent...@gmail.com> wrote: >>> >>> On Wed, Feb 5, 2025 at 4:40 PM Qing Zhao <qing.z...@oracle.com> wrote: >>>> >>>> >>>> >>>>> On Feb 5, 2025, at 07:46, Richard Biener <richard.guent...@gmail.com> >>>>> wrote: >>>>> >>>>> On Tue, Feb 4, 2025 at 10:12 PM Qing Zhao <qing.z...@oracle.com> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> One of our big application segv in libgcc code while unwinding the stack. >>>>>> >>>>>> This is a random crash while the application throws a c++ exception and >>>>>> unwinds the stack. Incidents are random and never can be reproduced by >>>>>> any >>>>>> test case. >>>>>> >>>>>> The libgcc that is used is based on GCC 8. >>>>>> >>>>>> Fortunately, we got some information through the stack trace, and >>>>>> narrowed >>>>>> down the crash is in the routine "init_object" of >>>>>> libgcc/unwind-dw2-pde.c: >>>>>> >>>>>> 777 count = classify_object_over_fdes (ob, ob->u.single); >>>>>> >>>>>> when loading the 2nd parameter of the call to >>>>>> "classify_object_over_fdes". >>>>>> i.e, the address that is pointed by "ob->u.single" is invalid. >>>>>> >>>>>> And the outer caller we can track back is the following line 1066 of the >>>>>> routine >>>>>> "_Unwind_Find_FDE": >>>>>> >>>>>> 1060 /* Classify and search the objects we've not yet processed. */ >>>>>> 1061 while ((ob = unseen_objects)) >>>>>> 1062 { >>>>>> 1063 struct object **p; >>>>>> 1064 >>>>>> 1065 unseen_objects = ob->next; >>>>>> 1066 f = search_object (ob, pc); >>>>>> >>>>>> Then we suspected that the construction of the static variable >>>>>> "unseen_objects" >>>>>> might have some bug. >>>>>> >>>>>> Then after studying the source code that constructs "unseen_objects" in >>>>>> libgcc/unwind-dw2-pde.c, I found an issue in the routines >>>>>> "__register_frame" >>>>>> and "__register_frame_table" as following: >>>>>> >>>>>> 127 void >>>>>> 128 __register_frame (void *begin) >>>>>> 129 { >>>>>> 130 struct object *ob; >>>>>> 131 >>>>>> 132 /* If .eh_frame is empty, don't register at all. */ >>>>>> 133 if (*(uword *) begin == 0) >>>>>> 134 return; >>>>>> 135 >>>>>> 136 ob = malloc (sizeof (struct object)); >>>>>> 137 __register_frame_info (begin, ob); >>>>>> 138 } >>>>>> >>>>>> 181 void >>>>>> 182 __register_frame_table (void *begin) >>>>>> 183 { >>>>>> 184 struct object *ob = malloc (sizeof (struct object)); >>>>>> 185 __register_frame_info_table (begin, ob); >>>>>> 186 } >>>>>> 187 >>>>>> >>>>>> In the above, one obvious issue in the source code is at line 136, line >>>>>> 137, >>>>>> and line 184 and line 185: the return value of call to "malloc" is not >>>>>> checked >>>>>> against NULL before it was passed as the second parameter of the >>>>>> follwoing call. >>>>>> >>>>>> This might cause unpredicted random behavior. >>>>>> >>>>>> I checked the latest trunk GCC libgcc, the same issue is there too. >>>>>> >>>>>> This patch is for the latest trunk GCC. >>>>>> >>>>>> Please let me know any comments on this? >>>>> >>>>> I think I've seen this elsewhere — >>>> >>>> Do you mean that you saw this problem (malloc return NULL during >>>> register_frame) previously? >>> >>> It was probably reported to us (SUSE) from a customer/partner, also >>> from a (llvm?) JIT context. >> >> Okay. >> So, that bug has not been resolved yet? > > It was resolved from our side IIRC as being a problem in the > application with the JIT and > otherwise a feature request (better unwind API for this use case, > allowing a failure mode).
Okay, I see. Thanks. Qing > > Richard. > >> Qing >>> >>>>> the issue is the unwind register API does >>>>> not allow for failures but I also think calling abort() is bad. >>>> >>>> Okay, I see. >>>> >>>>> >>>>> Are you calling this from a JIT context or so? >>>> >>>> I will ask the bug submitter on this to make sure. >>>> >>>>> We're assuming that at program >>>>> start malloc() will not fail. >>>> >>>> So, the routine __register_frame is only called at the _start_ of a >>>> program? >>>>> >>>>> The proper solution is probably to add an alternate ABI which has a way >>>>> to fail >>>>> during registration. >>>> >>>> Any suggestion on this (or any other routine I can refer to?) >>>> >>>> >>>> Thanks a lot for the help. >>>> >>>> Qing >>>>> >>>>> Richard. >>>>> >>>>>> thanks. >>>>>> >>>>>> Qing >>>>>> >>>>>> >>>>>> ======================= >>>>>> >>>>>> Fix two issues in libgcc/unwind-dw2-fde.c: >>>>>> >>>>>> A. The returned value of call to malloc is not checked against NULL. >>>>>> This might cause unpredicted behavior during stack unwinding. >>>>>> >>>>>> B. Check for null begin parameter (as well as pointer to null) in >>>>>> __register_frame_info_table_bases and __register_frame_table. >>>>>> >>>>>> libgcc/ChangeLog: >>>>>> >>>>>> * unwind-dw2-fde.c (__register_frame): Check the return value of >>>>>> call >>>>>> to malloc. >>>>>> (__register_frame_info_table_bases): Check for null begin parameter. >>>>>> (__register_frame_table): Check the return value of call to malloc, >>>>>> Check for null begin parameter. >>>>>> --- >>>>>> libgcc/unwind-dw2-fde.c | 12 ++++++++++++ >>>>>> 1 file changed, 12 insertions(+) >>>>>> >>>>>> diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c >>>>>> index cdfd3974c99..f0bc29d682a 100644 >>>>>> --- a/libgcc/unwind-dw2-fde.c >>>>>> +++ b/libgcc/unwind-dw2-fde.c >>>>>> @@ -169,6 +169,8 @@ __register_frame (void *begin) >>>>>> return; >>>>>> >>>>>> ob = malloc (sizeof (struct object)); >>>>>> + if (ob == NULL) >>>>>> + abort (); >>>>>> __register_frame_info (begin, ob); >>>>>> } >>>>>> >>>>>> @@ -180,6 +182,10 @@ void >>>>>> __register_frame_info_table_bases (void *begin, struct object *ob, >>>>>> void *tbase, void *dbase) >>>>>> { >>>>>> + /* If .eh_frame is empty, don't register at all. */ >>>>>> + if ((const uword *) begin == 0 || *(const uword *) begin == 0) >>>>>> + return; >>>>>> + >>>>>> ob->pc_begin = (void *)-1; >>>>>> ob->tbase = tbase; >>>>>> ob->dbase = dbase; >>>>>> @@ -210,7 +216,13 @@ __register_frame_info_table (void *begin, struct >>>>>> object *ob) >>>>>> void >>>>>> __register_frame_table (void *begin) >>>>>> { >>>>>> + /* If .eh_frame is empty, don't register at all. */ >>>>>> + if (*(uword *) begin == 0) >>>>>> + return; >>>>>> + >>>>>> struct object *ob = malloc (sizeof (struct object)); >>>>>> + if (ob == NULL) >>>>>> + abort (); >>>>>> __register_frame_info_table (begin, ob); >>>>>> } >>>>>> >>>>>> -- >>>>>> 2.31.1