Hi, On 2023-06-29 11:58:27 +0200, Tomas Vondra wrote: > On 6/29/23 01:34, Andres Freund wrote: > > On 2023-06-28 23:26:00 +0200, Tomas Vondra wrote: > >> Yeah. FWIW I was interested what the patch does in practice, so I > >> checked what pahole says about impact on struct sizes: > >> > >> AllocSetContext 224B -> 208B (4 cachelines) > >> GenerationContext 152B -> 136B (3 cachelines) > >> SlabContext 200B -> 200B (no change, adds 4B hole) > ... > > That would save another 12 bytes, if I calculate correctly. 25% shrinkage > > together ain't bad. > > > > I don't oppose these changes, but I still don't quite believe it'll make > a measurable difference (even if we manage to save a cacheline or two). > I'd definitely like to see some measurements demonstrating it's worth > the extra complexity.
I hacked (emphasis on that) a version together that shrinks AllocSetContext down to 176 bytes. There seem to be some minor performance gains, and some not too shabby memory savings. E.g. a backend after running readonly pgbench goes from (results repeat precisely across runs): pgbench: Grand total: 1361528 bytes in 289 blocks; 367480 free (206 chunks); 994048 used to: pgbench: Grand total: 1339000 bytes in 278 blocks; 352352 free (188 chunks); 986648 used Running a total over all connections in the main regression tests gives less of a win (best of three): backends grand blocks free chunks used 690 1046956664 111373 370680728 291436 676275936 to: backends grand blocks free chunks used 690 1045226056 111099 372972120 297969 672253936 the latter is produced with this beauty: ninja && m test --suite setup --no-rebuild && m test --no-rebuild --print-errorlogs regress/regress -v && grep "Grand total" testrun/regress/regress/log/postmaster.log|sed -E -e 's/.*Grand total: (.*) bytes in (.*) blocks; (.*) free \((.*) chunks\); (.*) used/\1\t\2\t\3\t\4\t\5/'|awk '{backends += 1; grand += $1; blocks += $2; free += $3; chunks += $4; used += $5} END{print backends, grand, blocks, free, chunks, used}' There's more to get. The overhead of AllocSetBlock also plays into this. Both due to the keeper block and obviously separate blocks getting allocated subsequently. We e.g. don't need AllocBlockData->next,prev as 8 byte pointers (some trickiness would be required for external blocks, but they could combine both). Greetings, Andres Freund