Hi, On 2021-10-21 18:27:25 -0400, Tom Lane wrote: > Today, pg_dump does a lot of internal lookups via binary search > in presorted arrays. I thought it might improve matters > to replace those binary searches with hash tables, theoretically > converting O(log N) searches into O(1) searches. So I tried making > a hash table indexed by CatalogId (tableoid+oid) with simplehash.h, > and replacing as many data structures as I could with that.
That does sound like a good idea in theory... > This makes the code shorter and (IMO anyway) cleaner, but > > (a) the executable size increases by a few KB --- apparently, even > the minimum subset of simplehash.h's functionality is code-wasteful. Hm. Surprised a bit by that. In an optimized build the difference is a smaller, at least. optimized: text data bss dec hex filename 448066 7048 1368 456482 6f722 src/bin/pg_dump/pg_dump 447530 7048 1496 456074 6f58a src/bin/pg_dump/pg_dump.orig debug: text data bss dec hex filename 516883 7024 1352 525259 803cb src/bin/pg_dump/pg_dump 509819 7024 1480 518323 7e8b3 src/bin/pg_dump/pg_dump.orig The fact that optimization plays such a role makes me wonder if a good chunk of the difference is the slightly more complicated find{Type,Func,...}ByOid() functions. > (b) I couldn't measure any change in performance at all. I tried > it on the regression database and on a toy DB with 10000 simple > tables. Maybe on a really large DB you'd notice some difference, > but I'm not very optimistic now. Did you measure runtime of pg_dump, or how much CPU it used? I think a lot of the time the backend is a bigger bottleneck than pg_dump... For the regression test DB the majority of the time seems to be spent below two things: 1) libpq 2) sortDumpableObjects(). I don't think 2) hits the binary search / hashtable path? It does seem interesting that a substantial part of the time is spent in/below PQexec() and PQfnumber(). Especially the latter shouldn't be too hard to optimize away... Greetings, Andres Freund