Hi, Regards, Zhang Mingli On Jul 16, 2023 at 09:57 +0800, Andres Freund <and...@anarazel.de>, wrote: > Hi, > > Several loops which are important for query performance, like heapgetpage()'s > loop over all tuples, have to call functions like > HeapCheckForSerializableConflictOut() and PredicateLockTID() in every > iteration. > > When serializable is not in use, all those functions do is to to return. But > being situated in a different translation unit, the compiler can't inline > (without LTO at least) the check whether serializability is needed. It's not > just the function call overhead that's noticable, it's also that registers > have to be spilled to the stack / reloaded from memory etc. > > On a freshly loaded pgbench scale 100, with turbo mode disabled, postgres > pinned to one core. Parallel workers disabled to reduce noise. All times are > the average of 15 executions with pgbench, in a newly started, but prewarmed > postgres. > > SELECT * FROM pgbench_accounts OFFSET 10000000; > HEAD: > 397.977 > > removing the HeapCheckForSerializableConflictOut() from heapgetpage() > (incorrect!), to establish the baseline of what serializable costs: > 336.695 > > pulling out CheckForSerializableConflictOutNeeded() from > HeapCheckForSerializableConflictOut() in heapgetpage(), and avoiding calling > HeapCheckForSerializableConflictOut() in the loop: > 339.742 > > moving the loop into a static inline function, marked as pg_always_inline, > called with static arguments for always_visible, check_serializable: > 326.546 > > marking the always_visible, !check_serializable case likely(): > 322.249 > > removing TestForOldSnapshot() calls, which we pretty much already decided on: > 312.987 > > > FWIW, there's more we can do, with some hacky changes I got the time down to > 273.261, but the tradeoffs start to be a bit more complicated. And 397->320ms > for something as core as this, is imo worth considering on its own. > > > > > Now, this just affects the sequential scan case. heap_hot_search_buffer() > shares many of the same pathologies. I find it a bit harder to improve, > because the compiler's code generation seems to switch between good / bad with > changes that seems unrelated... > > > I wonder why we haven't used PageIsAllVisible() in heap_hot_search_buffer() so > far? > > > Greetings, > > Andres Freund
LGTM and I have a fool question: if (likely(all_visible)) { if (likely(!check_serializable)) scan->rs_ntuples = heapgetpage_collect(scan, snapshot, page, buffer, block, lines, 1, 0); else scan->rs_ntuples = heapgetpage_collect(scan, snapshot, page, buffer, block, lines, 1, 1); } else { if (likely(!check_serializable)) scan->rs_ntuples = heapgetpage_collect(scan, snapshot, page, buffer, block, lines, 0, 0); else scan->rs_ntuples = heapgetpage_collect(scan, snapshot, page, buffer, block, lines, 0, 1); Does it make sense to combine if else condition and put it to the incline function’s param? Like: scan->rs_ntuples = heapgetpage_collect(scan, snapshot, page, buffer, block, lines, all_visible, check_serializable);