Hi Robert, On Tue, May 14, 2024 at 5:36 PM Robert Haas <robertmh...@gmail.com> wrote: > > On Tue, May 14, 2024 at 6:13 AM Jakub Wartak > <jakub.war...@enterprisedb.com> wrote: > > OK I'll try to explain using assembly, but I'm not an expert on this. > > Let's go to the 1st post, assume we run with backtrace_function set: > > I feel like this explanation doesn't really explain very much. I mean, > the question is not "how is it that adding a nop instruction fixes > anything?" but "is adding a nop instruction a principled way of fixing > things, and if so, for what reason?".
Short: IMHO the backtrace() is reliable and cannot be blamed here, it's just that the __attribute__((cold)) with __builtin_unreachable prevents backtrace() from retrieving proper data from the stack. Long: using compiler cold branches with __builtin_unreachable() causes the compiler to generate specific code that is misleading GNU libc's backtrace() and tools like addr2line/gdb/objdump. The exact instruction that causes it is that CALL assembly stores the saved RIP onto the stack frame. So, if you have something like this (that's from attached trimmed C semi-reproducer that tries to mimic what PG is doing with ereport() macro and cold path -- NOTE: not exact match for instructions, but good enough to see problems related with it): $ cat bt.c [..] int main() { int r; printf("starting f(), NOT yet in somemisleading_another_f() !\n"); r = f(); printf("%d\n", r*2); printf("starting somemisleading_another_f()\n"); r = somemisleading_another_f(); [..] $ gcc-12 -Wall -rdynamic -g -O2 bt.c -o bt $ ./bt starting f(), NOT yet in somemisleading_another_f() ! err_cold(): 3 errstatic(): 3 ./bt(dump_current_backtrace+0x35) [0x564fbb25d2d5] ./bt(err_cold+0x2f) [0x564fbb25d11f] ./bt(+0x1133) [0x564fbb25d133] ./bt(+0x1144) [0x564fbb25d144] ./bt(main+0x23) [0x564fbb25d173] /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7fba4028624a] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7fba40286305] ./bt(_start+0x21) [0x564fbb25d1d1] finishing demo() macro $ addr2line -e ./bt -f 0x1133 f /root/bt.c:84 $ addr2line -e ./bt -f 0x1144 somemisleading_another_f /root/bt.c:84 NOTE here that the code has NOT called somemisleading_another_f() at all, but backtrace() is going to fetch 0x1144, because such RIP value was stored by the CALL. In asm, it looks like this, in the brackets I've put sequence of instructions: 0000000000001125 <f.part.0>: 1125: 55 push %rbp [4] 1126: bf 03 00 00 00 mov $0x3,%edi [5] 112b: 48 89 e5 mov %rsp,%rbp [6] 112e: e8 bd ff ff ff call 10f0 <err_cold> [7 --> .. backtrace()] 1133: bf 03 00 00 00 mov $0x3,%edi 1138: e8 73 02 00 00 call 13b0 <err_finish> 000000000000113d <f.cold>: 113d: 31 c0 xor %eax,%eax [2] 113f: e8 e1 ff ff ff call 1125 <f.part.0> [3 --> this is going to store 0x113f+4b = 0x1144 as saved RIP onto stack frame] 0000000000001144 <somemisleading_another_f.cold>: 1144: 31 c0 xor %eax,%eax 1146: e8 da ff ff ff call 1125 <f.part.0> 114b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) [..] 00000000000013e0 <f>: [..] 13ed: e8 7e fc ff ff call 1070 <getpid@plt> 13f2: a8 01 test $0x1,%al 13f4: 0f 85 43 fd ff ff jne 113d <f.cold> [1] So here, just like in PostgreSQL, in the f.cold the "CALL"@0x113f is going to save RIP of the following instruction (0x1144), but that's from a different (next) function in address layout. So when backtrace() is going to retrieve that stack frame, it's going to see something belonging to a function that could not even be called physically. The thing is that path is never going to return (and that's also caused by usage of __builtin_unreachable()), so for the compiler there's no need to generate any other instruction after such CALL. There's no such need, of course unless you want to have valid backtrace() output. Storing literally any instruction after such CALL, causes the savedRIP in the stack frame to point back to the proper origin function. So again, with on fresh PG18 today (compiled with: ./configure --prefix=/usr/pgsql18 ), reproducer gives shows location error as get_collation_isdeterministic() in psql: psql:ri-collation-bug-example.sql:42: ERROR: cache lookup failed for collation 0 psql:ri-collation-bug-example.sql:44: error: ERROR: XX000: cache lookup failed for collation 0 LOCATION: get_collation_isdeterministic, lsyscache.c:1062 // \errverbose it's evident that backtrace_functions produces garbage: 2024-08-26 10:00:22.317 CEST [39311] STATEMENT: delete from revisions where id='5c617ce7-688d-4bea-9d66-c0f0ebc635da'; 2024-08-26 10:00:45.216 CEST [39325] ERROR: cache lookup failed for collation 0 2024-08-26 10:00:45.216 CEST [39325] BACKTRACE: postgres: postgres test1 [local] DELETE(+0x155898) [0x55e9161be898] postgres: postgres test1 [local] DELETE(RI_FKey_cascade_del+0x2a9) [0x55e9165c2479] postgres: postgres test1 [local] DELETE(+0x2d7e1f) [0x55e916340e1f] [..] so what's the 0x155898 ? We get: $ addr2line -e /usr/pgsql18/bin/postgres -a -f 0x155898 0x0000000000155898 get_constraint_type.cold lsyscache.c:? That's wrong, we should get "get_collation_isdeterministic". Now with the patch backtrace_functions produces: 2024-08-26 10:10:26.151 CEST [49283] ERROR: cache lookup failed for collation 0 2024-08-26 10:10:26.151 CEST [49283] BACKTRACE: postgres: postgres test2 [local] DELETE(+0x16964f) [0x55dc64f4064f] postgres: postgres test2 [local] DELETE(RI_FKey_cascade_del+0x2a9) [0x55dc65343559] [..] 2024-08-26 10:10:26.151 CEST [49283] STATEMENT: delete from revisions where id='5c617ce7-688d-4bea-9d66-c0f0ebc635da'; $ addr2line -e /usr/pgsql18/bin/postgres -a -f 0x16964f 0x000000000016964f get_collation_isdeterministic.cold lsyscache.c:? > I think the right answer here is probably what Alvaro said, namely, that > people have to have the debug symbols installed if they want to get > backtraces. Yes, the binary cannot be strip(1)-ed (ELF symbols table '.symtab' cannot be emptied) for it to work OR the binary can be striped, but then the debug symbols need to be installed as add-on in order to decode the addresses, but that's standard thing today and not such installation is not a problem in general. Fun fact: one thing worth mentioning here is that I was plain wrong: it is NOT --enable-debug that is causing - or not - generating .cold functions. Those seem to be almost always mostly generated on modern gcc compilers. E.g. Debian official PGDG packages come without ".cold" function optimizations VISIBLE within the binary by default when using normal objdump(1). objdump(1) was unable to resolve anything to have ".cold", but following assembly, it looked like those stubs were present. Now using e.g. objdump --dwarf --disassemble=get_collation_oid.cold (the key here is to use --dwarf), I was able to get those function names with .cold . The --dwarf switch actually starts reading the external ("/usr/lib/debug/.build-id/") symbols and confirms they are there. At first I thought that clang does not seem to be affected by this when dealing with my toy bt.c (actually clang fixes it!), it looks like it is not generating those ".cold" stubs at all. However the issue is still the wrong output in PG even with clang ! (I've got the wrong function output in log too, and again there was CALL as last instruction, so it saved the wrong RIP of the follow-up instruction too!). My org. patch does not fix it there, as sadly clang optimized out my volatile char. To my disgust, what helped there was the below thing: --- a/src/include/utils/elog.h +++ b/src/include/utils/elog.h @@ -145,8 +145,11 @@ struct Node; - if (__builtin_constant_p(elevel) && (elevel) >= ERROR) \ + if (__builtin_constant_p(elevel) && (elevel) >= ERROR) { \ + volatile char fix_backtrace_addr = 0x42; \ + fix_backtrace_addr = fix_backtrace_addr + 1; \ pg_unreachable(); \ It's kind of ugly hack, maybe some other hackers have better ideas. Also I have not checked different archs that the x86_64 (and part of the problem is that it needs a CPU-agnostic operand). -J.
#include <stdio.h> #include <stdlib.h> #include <execinfo.h> #include <time.h> #include <unistd.h> #define BT_BUF_SIZE 128 /* just print backtrace */ void dump_current_backtrace() { int nptrs; void *buffer[BT_BUF_SIZE]; char **strings; nptrs = backtrace(buffer, BT_BUF_SIZE); strings = backtrace_symbols(buffer, nptrs); if (strings == NULL) { perror("backtrace_symbols"); exit(EXIT_FAILURE); } for (int j = 0; j < nptrs; j++) printf(" %s\n", strings[j]); printf("\n"); free(strings); } static int errstatic(int l) { printf("errstatic(): %d\n", l); dump_current_backtrace(); return 3; } /* PG uses #define pg_attribute_cold __attribute__((cold)) */ __attribute__ ((cold)) int err_cold(int l) { printf("err_cold(): %d\n", l); errstatic(l); return 0; } /* it's the same but hot */ int err(int l) { printf("err(): %d\n", l); errstatic(l); return 0; } void __attribute__ ((noinline)) err_finish(int level) { printf("finishing demo() macro\n"); if(level == 3) exit(3); } /* * Try to mimic what PG does in demo() macro */ /* * elog.h says: * When __builtin_constant_p is available and elevel >= ERROR we make a call * to errstart_cold() instead of errstart(). This version of the function is * marked with pg_attribute_cold which will coax supporting compilers into * generating code which is more optimized towards non-ERROR cases. Because * we use __builtin_constant_p() in the condition, when elevel is not a * compile-time constant, or if it is, but it's < ERROR, the compiler has no * need to generate any code for this branch. It can simply call errstart() * unconditionally. */ #define demo(level) \ do { \ if (__builtin_constant_p(level) && (level) >= 2 ? err_cold(level) : err(level)) { \ err_finish(level); \ } \ if (__builtin_constant_p(level) && (level) >= 3) { \ __builtin_unreachable(); \ } \ } while (0) int f() { if(getpid() % 2 != 0) demo(3); for(int i = 0; i < 10; i++) { usleep(10); demo(1); } return 42; } /* this serves a purpose to just generate more symbols/use more addresses */ int somemisleading_another_f() { if(getpid() % 99 == 0) demo(3); return 42; } int main() { int r; printf("starting f(), NOT yet in somemisleading_another_f() !\n"); r = f(); printf("%d\n", r*2); printf("starting somemisleading_another_f()\n"); r = somemisleading_another_f(); printf("%d\n", r*2); return 0; }