In particular:
    exec_bind_message()
        PushActiveSnapshot(GetTransactionSnapshot());


Suppressing this I've achieved over 1.9 M TXN's a second on select only pgbench 
on a 48 core box.  It is about 50% faster with this change.  The cpu usage of 
GetSnapshotData drops from about 22% to 4.5%.

If there were no input functions, that needed this, nor reparsing or 
reanalyzing needed, and we knew this up front, it'd be a huge win.  We could 
test for a number of conditions on the first parse/optimization of the query 
and set a flag to suppress this for subsequent executions.


NOTE:

In GetSnapshotData because pgxact, is declared volatile, the compiler will not 
reduce the following two IF tests into a single test:


    if (pgxact->vacuumFlags & PROC_IN_LOGICAL_DECODING)
        continue;


    if (pgxact->vacuumFlags & PROC_IN_VACUUM)
        continue;


You can reduce the code path in the inner loop by coding this as:

    if (pgxact->vacuumFlags & (PROC_IN_LOGICAL_DECODING|PROC_IN_VACUUM))
        continue;


I'm still working on quantifying any gain.  Note it isn't just one L1 cache

fetch and one conditional branch eliminated.  Due to the update frequency of 
the pgxact cache line, for single statement TXN's, there are a certain number 
of full cache misses, due to invalidation, that occurs when given pgxact is 
updated between the first fetch of vacuumFlags and the 2nd fetch.

Reply via email to