Re: Experimenting with hash tables inside pg_dump

2021-10-25 Thread Andres Freund
Hi, On 2021-10-25 13:58:06 -0400, Tom Lane wrote: > Andres Freund writes: > >> Seems like we need a less quick-and-dirty approach to dealing with > >> unnecessary simplehash support functions. > > > I don't think the problem is unnecessary ones? > > I was thinking about the stuff like SH_ITERAT

Re: Experimenting with hash tables inside pg_dump

2021-10-25 Thread Tom Lane
Andres Freund writes: > On 2021-10-22 16:32:39 -0400, Tom Lane wrote: >> Hmm, harder than it sounds. If I remove "inline" from SH_SCOPE then >> the compiler complains about unreferenced static functions, while >> if I leave it there than adding pg_noinline causes a complaint about >> conflicting

Re: Experimenting with hash tables inside pg_dump

2021-10-25 Thread Andres Freund
Hi, Thanks for pushing the error handling cleanup etc! On 2021-10-22 16:32:39 -0400, Tom Lane wrote: > I wrote: > > Andres Freund writes: > >> Wonder if we should mark simplehash's grow as noinline? Even with a single > >> caller it seems better to not inline it to remove register allocator >

Re: Experimenting with hash tables inside pg_dump

2021-10-22 Thread Tom Lane
I wrote: > Andres Freund writes: >> Wonder if we should mark simplehash's grow as noinline? Even with a single >> caller it seems better to not inline it to remove register allocator >> pressure. > Seems plausible --- you want me to go change that? Hmm, harder than it sounds. If I remove "inl

Re: Experimenting with hash tables inside pg_dump

2021-10-22 Thread Tom Lane
Andres Freund writes: > Which made me look at the code invoking it from simplehash. I think the patch > that made simplehash work in frontend code isn't quite right, because > pg_log_error() returns... Indeed, that's broken. I guess we want pg_log_fatal then exit(1). > Wonder if we should mar

Re: Experimenting with hash tables inside pg_dump

2021-10-22 Thread Andres Freund
Hi, On October 22, 2021 10:32:30 AM PDT, Tom Lane wrote: >Andres Freund writes: >>> On 2021-10-21 18:27:25 -0400, Tom Lane wrote: (a) the executable size increases by a few KB --- apparently, even the minimum subset of simplehash.h's functionality is code-wasteful. > >> If I prevent t

Re: Experimenting with hash tables inside pg_dump

2021-10-22 Thread Tom Lane
Andres Freund writes: > On October 22, 2021 8:54:13 AM PDT, Tom Lane wrote: >> Were you planning to pursue this further, or did you want me to? > It seems too nice an improvement to drop on the floor. That said, I don't > really have the mental bandwidth to pursue this beyond the POC stage - it

Re: Experimenting with hash tables inside pg_dump

2021-10-22 Thread Andres Freund
Hi, On October 22, 2021 8:54:13 AM PDT, Tom Lane wrote: >Andres Freund writes: >> On 2021-10-22 10:53:31 -0400, Tom Lane wrote: >>> I'm skeptical of that, mainly because it doesn't work in old servers, > >> I think we can address that, if we think it's overall a promising approach to >> pursue.

Re: Experimenting with hash tables inside pg_dump

2021-10-22 Thread Tom Lane
Andres Freund writes: >> On 2021-10-21 18:27:25 -0400, Tom Lane wrote: >>> (a) the executable size increases by a few KB --- apparently, even >>> the minimum subset of simplehash.h's functionality is code-wasteful. > If I prevent the compiler from inlining findObjectByCatalogId() in all the > fin

Re: Experimenting with hash tables inside pg_dump

2021-10-22 Thread Tom Lane
Andres Freund writes: > On 2021-10-22 10:53:31 -0400, Tom Lane wrote: >> I'm skeptical of that, mainly because it doesn't work in old servers, > I think we can address that, if we think it's overall a promising approach to > pursue. E.g. if we don't need the indexes, we can make it = ANY(). Hmm

Re: Experimenting with hash tables inside pg_dump

2021-10-22 Thread Andres Freund
Hi, On 2021-10-22 10:53:31 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2021-10-21 22:13:22 -0400, Tom Lane wrote: > >> I've thought about doing something like > >> SELECT unsafe-functions FROM pg_class WHERE oid IN (someoid, someoid, ...) > >> but in cases with tens of thousands of tabl

Re: Experimenting with hash tables inside pg_dump

2021-10-22 Thread Tom Lane
Andres Freund writes: > On 2021-10-21 22:13:22 -0400, Tom Lane wrote: >> I've thought about doing something like >> SELECT unsafe-functions FROM pg_class WHERE oid IN (someoid, someoid, ...) >> but in cases with tens of thousands of tables, it seems unlikely that >> that's going to behave all that

Re: Experimenting with hash tables inside pg_dump

2021-10-21 Thread Andres Freund
Hi, On 2021-10-21 22:13:22 -0400, Tom Lane wrote: > Andres Freund writes: > > I wonder though if for some of them we should instead replace the per-object > > queries with one query returning the information for all objects of a type. > > It > > doesn't make all that much sense that we build and

Re: Experimenting with hash tables inside pg_dump

2021-10-21 Thread Tom Lane
Andres Freund writes: > I wonder though if for some of them we should instead replace the per-object > queries with one query returning the information for all objects of a type. It > doesn't make all that much sense that we build and send one query for each > table and index. The trick is the pr

Re: Experimenting with hash tables inside pg_dump

2021-10-21 Thread Andres Freund
Hi, On 2021-10-21 20:22:56 -0400, Tom Lane wrote: > Andres Freund writes: > Yeah, that. I tried doing a system-wide "perf" measurement, and soon > realized that a big fraction of the time for a "pg_dump -s" run is > being spent in the planner :-(. A trick for seeing the proportions of this easi

Re: Experimenting with hash tables inside pg_dump

2021-10-21 Thread Andres Freund
Hi, On 2021-10-21 16:37:57 -0700, Andres Freund wrote: > On 2021-10-21 18:27:25 -0400, Tom Lane wrote: > > (a) the executable size increases by a few KB --- apparently, even > > the minimum subset of simplehash.h's functionality is code-wasteful. > > Hm. Surprised a bit by that. In an optimized b

Re: Experimenting with hash tables inside pg_dump

2021-10-21 Thread Tom Lane
Andres Freund writes: > Did you measure runtime of pg_dump, or how much CPU it used? I was looking mostly at wall-clock runtime, though I did notice that the CPU time looked about the same too. > I think a lot of > the time the backend is a bigger bottleneck than pg_dump... Yeah, that. I tried

Re: Experimenting with hash tables inside pg_dump

2021-10-21 Thread Bossart, Nathan
On 10/21/21, 4:14 PM, "Bossart, Nathan" wrote: > On 10/21/21, 3:29 PM, "Tom Lane" wrote: >> (b) I couldn't measure any change in performance at all. I tried >> it on the regression database and on a toy DB with 1 simple >> tables. Maybe on a really large DB you'd notice some difference, >>

Re: Experimenting with hash tables inside pg_dump

2021-10-21 Thread Andres Freund
Hi, On 2021-10-21 18:27:25 -0400, Tom Lane wrote: > Today, pg_dump does a lot of internal lookups via binary search > in presorted arrays. I thought it might improve matters > to replace those binary searches with hash tables, theoretically > converting O(log N) searches into O(1) searches. So I

Re: Experimenting with hash tables inside pg_dump

2021-10-21 Thread Bossart, Nathan
On 10/21/21, 3:29 PM, "Tom Lane" wrote: > (b) I couldn't measure any change in performance at all. I tried > it on the regression database and on a toy DB with 1 simple > tables. Maybe on a really large DB you'd notice some difference, > but I'm not very optimistic now. I wonder how many ta