Hi, +extern PGDLLIMPORT uint32 *my_wait_event_info;
It seems volatile should be added to the above declaration. Since later: + *(volatile uint32 *) my_wait_event_info = wait_event_info; Cheers On Fri, Apr 2, 2021 at 12:45 PM Andres Freund <and...@anarazel.de> wrote: > Hi, > > This grew out of my patch to split the waits event code out of > pgstat.[ch], which in turn grew out of the shared memory stats patch > series. > > > pgstat_report_wait_start() and pgstat_report_wait_end() currently check > pgstat_track_activities before assigning to MyProc->wait_event_info. > Given the small cost of the assignment, and that pgstat_track_activities > is almost always used, I'm doubtful that that's the right tradeoff. > > Normally I would say that branch prediction will take care of this cost > - but because pgstat_report_wait_{start,end} are inlined, that has to > happen in each of the calling locations. > > The code works out to be something like the following (this is from > basebackup_read_file, the simplest caller I could quickly find, I > removed interspersed code from it): > > 267 if (!pgstat_track_activities || !proc) > 0x0000000000430e4d <+13>: cmpb $0x1,0x4882e1(%rip) # > 0x8b9135 <pgstat_track_activities> > > 265 volatile PGPROC *proc = MyProc; > 0x0000000000430e54 <+20>: mov 0x48c52d(%rip),%rax # > 0x8bd388 <MyProc> > > 266 > 267 if (!pgstat_track_activities || !proc) > 0x0000000000430e5b <+27>: jne 0x430e6c <basebackup_read_file+44> > 0x0000000000430e5d <+29>: test %rax,%rax > 0x0000000000430e60 <+32>: je 0x430e6c <basebackup_read_file+44> > > 268 return; > 269 > 270 /* > 271 * Since this is a four-byte field which is always read > and written as > 272 * four-bytes, updates are atomic. > 273 */ > 274 proc->wait_event_info = wait_event_info; > 0x0000000000430e62 <+34>: movl $0xa000000,0x2c8(%rax) > > /home/andres/src/postgresql/src/backend/replication/basebackup.c: > 2014 rc = pg_pread(fd, buf, nbytes, offset); > 0x0000000000430e6c <+44>: call 0xc4790 <pread@plt> > > stripping the source: > 0x0000000000430e4d <+13>: cmpb $0x1,0x4882e1(%rip) # > 0x8b9135 <pgstat_track_activities> > 0x0000000000430e54 <+20>: mov 0x48c52d(%rip),%rax # > 0x8bd388 <MyProc> > 0x0000000000430e5b <+27>: jne 0x430e6c <basebackup_read_file+44> > 0x0000000000430e5d <+29>: test %rax,%rax > 0x0000000000430e60 <+32>: je 0x430e6c <basebackup_read_file+44> > 0x0000000000430e62 <+34>: movl $0xa000000,0x2c8(%rax) > 0x0000000000430e6c <+44>: call 0xc4790 <pread@plt> > > > just removing the pgstat_track_activities check turns that into > > 0x0000000000430d8d <+13>: mov 0x48c5f4(%rip),%rax # > 0x8bd388 <MyProc> > 0x0000000000430d94 <+20>: test %rax,%rax > 0x0000000000430d97 <+23>: je 0x430da3 <basebackup_read_file+35> > 0x0000000000430d99 <+25>: movl $0xa000000,0x2c8(%rax) > 0x0000000000430da3 <+35>: call 0xc4790 <pread@plt> > > which does seem (a bit) nicer. > > However, we can improve this further, entirely eliminating branches, by > introducing something like "my_wait_event_info" that initially just > points to a local variable and is switched to shared once MyProc is > assigned. > > Obviously incorrect, for comparison: Just removing the MyProc != NULL > check yields: > 0x0000000000430bcd <+13>: mov 0x48c7b4(%rip),%rax # > 0x8bd388 <MyProc> > 0x0000000000430bd4 <+20>: movl $0xa000000,0x2c8(%rax) > 0x0000000000430bde <+30>: call 0xc47d0 <pread@plt> > > using a uint32 *my_wait_event_info yields: > 0x0000000000430b4d <+13>: mov 0x47615c(%rip),%rax # > 0x8a6cb0 <my_wait_event_info> > 0x0000000000430b54 <+20>: movl $0xa000000,(%rax) > 0x0000000000430b5a <+26>: call 0xc47d0 <pread@plt> > > Note how the lack of offset addressing in the my_wait_event_info version > makes the instruction smaller (call is at 26 instead of 30). > > > Now, perhaps all of this isn't worth optimizing, most of the things done > within pgstat_report_wait_start()/end() are expensive-ish. And forward > branches are statically predicted to be not taken on several > platforms. I have seen this these instructions show up in profiles in > workloads with contended lwlocks at least... > > There's also a small win in code size: > text data bss dec hex filename > 8932095 192160 204656 9328911 8e590f src/backend/postgres > 8928544 192160 204656 9325360 8e4b30 > src/backend/postgres_my_wait_event_info > > > If we went for the my_wait_event_info approach there is one further > advantage, after my change to move the wait event code into a separate > file: wait_event.h does not need to include proc.h anymore, which seems > architecturally nice for things like fd.c. > > > Attached is patch series doing this. > > > I'm inclined to separately change the comment format for > wait_event.[ch], there's no no reason to stick with the current style: > > /* ---------- > * pgstat_report_wait_start() - > * > * Called from places where server process needs to wait. This is > called > * to report wait event information. The wait information is stored > * as 4-bytes where first byte represents the wait event class (type > of > * wait, for different types of wait, refer WaitClass) and the next > * 3-bytes represent the actual wait event. Currently 2-bytes are > used > * for wait event which is sufficient for current usage, 1-byte is > * reserved for future usage. > * > * NB: this *must* be able to survive being called before MyProc has been > * initialized. > * ---------- > */ > > I.e. I'd like to remove the ----- framing, the repetition of the > function name, and the varying indentation in the comment. > > Greetings, > > Andres Freund >