On Fri, Dec 15, 2023 at 5:44 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > > I did check that the v1 patch successfully inlines newNode() and > reduces it to just a MemoryContextAllocZeroAligned call, so it's > correct that modern compilers do that better than whatever I tested > in 2008. But I wonder what is happening in v2 to reduce the code > size so much. MemoryContextAllocZeroAligned is not 20kB by itself.
I poked at this a bit and it seems to come from what Heikki said upthread about fewer instructions before the calls: Running objdump on v1 and v2 copyfuncs.o and diff'ing shows there are fewer MOV instructions (some extraneous stuff removed): e9 da 5f 00 00 jmp <_copyReindexStmt> - 48 8b 05 00 00 00 00 mov rax,QWORD PTR [rip+0x0] - be 18 00 00 00 mov esi,0x18 - 48 8b 38 mov rdi,QWORD PTR [rax] - e8 00 00 00 00 call MemoryContextAllocZeroAligned-0x4 + bf 18 00 00 00 mov edi,0x18 + e8 00 00 00 00 call palloc0-0x4 That's 10 bytes savings. - 48 8b 05 00 00 00 00 mov rax,QWORD PTR [rip+0x0] - 48 8b 38 mov rdi,QWORD PTR [rax] - e8 00 00 00 00 call MemoryContextAllocZeroAligned-0x4 + e8 00 00 00 00 call palloc0-0x4 ...another 10 bytes. Over and over again. Because of the size differences, the compiler is inlining more: e.g. in v1 _copyFieldStore has 4 call sites, but in v2 it got inlined. About the patch, I'm wondering if this whitespace is intentional, but it's otherwise straightforward: --- a/src/include/nodes/nodes.h +++ b/src/include/nodes/nodes.h @@ -132,6 +132,7 @@ typedef struct Node #define nodeTag(nodeptr) (((const Node*)(nodeptr))->type) + /*