Tom Lane <t...@sss.pgh.pa.us> 于2024年8月19日周一 22:12写道:
> Robert Haas <robertmh...@gmail.com> writes: > > On Mon, Aug 19, 2024 at 2:17 AM Xiaoran Wang <fanfuxiao...@gmail.com> > wrote: > >> If the code in PG_TRY contains any non local control flow other than > >> ereport(ERROR) like goto, break etc., the PG_CATCH or PG_END_TRY cannot > >> be called, then the PG_exception_stack will point to the memory whose > >> stack frame has been released. So after that, when the pg_re_throw > >> called, __longjmp() will crash and report Segmentation fault error. > >> > >> Addition to sigjmp_buf, add another field 'int magic' which is next to > >> the sigjum_buf in the local stack frame memory. The magic's value is > always > >> 'PG_exception_magic 0x12345678'. And in 'pg_re_throw' routine, check if > >> the magic's value is still '0x12345678', if not, that means the memory > >> where the 'PG_exception_stack' points to has been released, and the > 'sigbuf' > >> must be invalid. > > > This is an interesting idea. I suspect if we do this we want a > > different magic number and a different error message than what you > > chose here, but those are minor details. > > I would suggest just adding an Assert; I doubt this check would be > useful in production. > Agree, an Assert is enough. > > > I'm not sure how reliable this would be at actually finding problems, > > though. > > Yeah, that's the big problem. I don't have any confidence at all > that this would detect misuse. It'd require that the old stack > frame gets overwritten, which might not happen for a long time, > and it'd require that somebody eventually do a longjmp, which again > might not happen for a long time --- and when those did happen, the > error would be detected in someplace far away from the actual bug, > with little evidence remaining to help you localize it. > Exactly, it cannot tell you which PG_TRY left the invalid sigjmp_buf, but to implement that is easy I think, recording the line num maybe. I think this is still useful, at least it tell you that the error is due to the PG_TRY. > > Also, as soon as some outer level of PG_TRY is exited in the proper > way, the dangling stack pointer will be fixed up. That means there's > a fairly narrow time frame in which the overwrite and longjmp must > happen for this to catch a bug. > > Yes, if the outer level PG_TRY call pg_re_throw instead of ereport, the dangling stack pointer will be fixed up. It's would be great to fix that up in any case. But I have no idea how to implement that now. In pg_re_throw, if we could use '_local_sigjmp_buf' instead of the global var PG_exception_stack, that would be great. We don't need to worry about the invalid sigjum_buf. So on the whole I doubt this'd be terribly useful in this form; > and I don't like the amount of code churn required. > > > Personally, I don't think I've ever made this particular mistake. I > > think a much more common usage error is exiting the catch-block > > without either throwing an error or rolling back a subtransaction. But > > YMMV, of course. > > We have had multiple instances of code "return"ing out of a PG_TRY, > so I fully agree that some better way to detect that would be good. > But maybe we ought to think about static analysis for that. > > regards, tom lane > -- Best regards ! Xiaoran Wang