Hi,

+extern PGDLLIMPORT uint32 *my_wait_event_info;

It seems volatile should be added to the above declaration. Since later:

+   *(volatile uint32 *) my_wait_event_info = wait_event_info;

Cheers

On Fri, Apr 2, 2021 at 12:45 PM Andres Freund <and...@anarazel.de> wrote:

> Hi,
>
> This grew out of my patch to split the waits event code out of
> pgstat.[ch], which in turn grew out of the shared memory stats patch
> series.
>
>
> pgstat_report_wait_start() and pgstat_report_wait_end() currently check
> pgstat_track_activities before assigning to MyProc->wait_event_info.
> Given the small cost of the assignment, and that pgstat_track_activities
> is almost always used, I'm doubtful that that's the right tradeoff.
>
> Normally I would say that branch prediction will take care of this cost
> - but because pgstat_report_wait_{start,end} are inlined, that has to
> happen in each of the calling locations.
>
> The code works out to be something like the following (this is from
> basebackup_read_file, the simplest caller I could quickly find, I
> removed interspersed code from it):
>
> 267             if (!pgstat_track_activities || !proc)
>    0x0000000000430e4d <+13>:    cmpb   $0x1,0x4882e1(%rip)        #
> 0x8b9135 <pgstat_track_activities>
>
> 265             volatile PGPROC *proc = MyProc;
>    0x0000000000430e54 <+20>:    mov    0x48c52d(%rip),%rax        #
> 0x8bd388 <MyProc>
>
> 266
> 267             if (!pgstat_track_activities || !proc)
>    0x0000000000430e5b <+27>:    jne    0x430e6c <basebackup_read_file+44>
>    0x0000000000430e5d <+29>:    test   %rax,%rax
>    0x0000000000430e60 <+32>:    je     0x430e6c <basebackup_read_file+44>
>
> 268                     return;
> 269
> 270             /*
> 271              * Since this is a four-byte field which is always read
> and written as
> 272              * four-bytes, updates are atomic.
> 273              */
> 274             proc->wait_event_info = wait_event_info;
>    0x0000000000430e62 <+34>:    movl   $0xa000000,0x2c8(%rax)
>
> /home/andres/src/postgresql/src/backend/replication/basebackup.c:
> 2014            rc = pg_pread(fd, buf, nbytes, offset);
>    0x0000000000430e6c <+44>:    call   0xc4790 <pread@plt>
>
> stripping the source:
>    0x0000000000430e4d <+13>:    cmpb   $0x1,0x4882e1(%rip)        #
> 0x8b9135 <pgstat_track_activities>
>    0x0000000000430e54 <+20>:    mov    0x48c52d(%rip),%rax        #
> 0x8bd388 <MyProc>
>    0x0000000000430e5b <+27>:    jne    0x430e6c <basebackup_read_file+44>
>    0x0000000000430e5d <+29>:    test   %rax,%rax
>    0x0000000000430e60 <+32>:    je     0x430e6c <basebackup_read_file+44>
>    0x0000000000430e62 <+34>:    movl   $0xa000000,0x2c8(%rax)
>    0x0000000000430e6c <+44>:    call   0xc4790 <pread@plt>
>
>
> just removing the pgstat_track_activities check turns that into
>
>    0x0000000000430d8d <+13>:    mov    0x48c5f4(%rip),%rax        #
> 0x8bd388 <MyProc>
>    0x0000000000430d94 <+20>:    test   %rax,%rax
>    0x0000000000430d97 <+23>:    je     0x430da3 <basebackup_read_file+35>
>    0x0000000000430d99 <+25>:    movl   $0xa000000,0x2c8(%rax)
>    0x0000000000430da3 <+35>:    call   0xc4790 <pread@plt>
>
> which does seem (a bit) nicer.
>
> However, we can improve this further, entirely eliminating branches, by
> introducing something like "my_wait_event_info" that initially just
> points to a local variable and is switched to shared once MyProc is
> assigned.
>
> Obviously incorrect, for comparison: Just removing the MyProc != NULL
> check yields:
>    0x0000000000430bcd <+13>:    mov    0x48c7b4(%rip),%rax        #
> 0x8bd388 <MyProc>
>    0x0000000000430bd4 <+20>:    movl   $0xa000000,0x2c8(%rax)
>    0x0000000000430bde <+30>:    call   0xc47d0 <pread@plt>
>
> using a uint32 *my_wait_event_info yields:
>    0x0000000000430b4d <+13>:    mov    0x47615c(%rip),%rax        #
> 0x8a6cb0 <my_wait_event_info>
>    0x0000000000430b54 <+20>:    movl   $0xa000000,(%rax)
>    0x0000000000430b5a <+26>:    call   0xc47d0 <pread@plt>
>
> Note how the lack of offset addressing in the my_wait_event_info version
> makes the instruction smaller (call is at 26 instead of 30).
>
>
> Now, perhaps all of this isn't worth optimizing, most of the things done
> within pgstat_report_wait_start()/end() are expensive-ish. And forward
> branches are statically predicted to be not taken on several
> platforms. I have seen this these instructions show up in profiles in
> workloads with contended lwlocks at least...
>
> There's also a small win in code size:
>    text    data     bss     dec     hex filename
> 8932095  192160  204656 9328911  8e590f src/backend/postgres
> 8928544  192160  204656 9325360  8e4b30
> src/backend/postgres_my_wait_event_info
>
>
> If we went for the my_wait_event_info approach there is one further
> advantage, after my change to move the wait event code into a separate
> file: wait_event.h does not need to include proc.h anymore, which seems
> architecturally nice for things like fd.c.
>
>
> Attached is patch series doing this.
>
>
> I'm inclined to separately change the comment format for
> wait_event.[ch], there's no no reason to stick with the current style:
>
> /* ----------
>  * pgstat_report_wait_start() -
>  *
>  *      Called from places where server process needs to wait.  This is
> called
>  *      to report wait event information.  The wait information is stored
>  *      as 4-bytes where first byte represents the wait event class (type
> of
>  *      wait, for different types of wait, refer WaitClass) and the next
>  *      3-bytes represent the actual wait event.  Currently 2-bytes are
> used
>  *      for wait event which is sufficient for current usage, 1-byte is
>  *      reserved for future usage.
>  *
>  * NB: this *must* be able to survive being called before MyProc has been
>  * initialized.
>  * ----------
>  */
>
> I.e. I'd like to remove the ----- framing, the repetition of the
> function name, and the varying indentation in the comment.
>
> Greetings,
>
> Andres Freund
>

Reply via email to