On Tue, 13 Apr 2021 at 21:40, Craig Ringer <craig.rin...@enterprisedb.com> wrote:
> Findings: > > * A probe without arguments or with simple arguments is just a 'nop' > instruction > * Probes that require function calls, pointer chasing, other > expression evaluation etc may impose a fixed cost to collect up > arguments even if the probe is disabled. > * SDT semaphores can avoid that cost but add a branch, so should > probably be avoided unless preparing probe arguments is likely to be > expensive. Back to the topic directly at hand. Attached a disassembly of what LWLockAcquire looks like now on my current build of git master @ 5fe83adad9efd5e3929f0465b44e786dc23c7b55 . This is an --enable-debug --enable-cassert --enable-dtrace build with -Og -ggdb3. The three tracepoints in it are all of the form: movzwl 0x0(%r13),%edi call 0x801c49 <GetLWTrancheName> nop so it's clear we're doing redundant calls to GetLWTrancheName(), as expected. Not ideal. Now if I patch it to add the _ENABLED() guards on all the tracepoints, the probes look like this: 0x0000000000803176 <+200>: cmpw $0x0,0x462da8(%rip) # 0xc65f26 <postgresql_lwlock__acquire_semaphore> 0x000000000080317e <+208>: jne 0x80328b <LWLockAcquire+477> .... other interleaved code ... 0x000000000080328b <+477>: movzwl 0x0(%r13),%edi 0x0000000000803290 <+482>: call 0x801c49 <GetLWTrancheName> 0x0000000000803295 <+487>: nop 0x0000000000803296 <+488>: jmp 0x803184 <LWLockAcquire+214> so we avoid the GetLWTrancheName() call at the cost of a test and possible branch, and a small expansion in total function size. Without the semaphores, LWLockAcquire is 463 bytes. With them, it's 524 bytes, which is nothing to sneeze at for code that doesn't do anything 99.999% of the time, but we avoid a bunch of GetLWTrancheName() calls. If I instead replace T_NAME() with NULL for all tracepoints in LWLockAcquire, the disassembly shows that the tracepoints now become a simple 0x0000000000803176 <+200>: nop which is pretty hard to be concerned about. So at the very least we should be calling GetLWTrancheName() once at the start of the function if built with dtrace support and remembering the value, instead of calling it for each tracepoint. And omitting the tranche name looks like it might be sensible for the LWLock code. In most places it won't matter, but LWLocks are hot enough that it possibly might. A simple pg_regress run hits LWLockAcquire 25 million times, so that's 75 million calls to GetLWTrancheName().