On Tue, May 23, 2023 at 1:33 PM Tristan Partin <tris...@neon.tech> wrote:
> We (Neon) have noticed that pgbench can be quite slow to populate data > in regard to higher latency connections. Higher scale factors exacerbate > this problem. Some employees work on a different continent than the > databases they might be benchmarking. By moving pgbench to use COPY for > populating all tables, we can reduce some of the time pgbench takes for > this particular step. > When latency is continent size high, pgbench should be run with server-side table generation instead of using COPY at all, for any table. The default COPY based pgbench generation is only intended for testing where the client and server are very close on the network. Unfortunately there's no simple command line option to change just that one thing about how pgbench runs. You have to construct a command line that documents each and every step you want instead. You probably just want this form: $ pgbench -i -I dtGvp -s 500 That's server-side table generation with all the usual steps. I use this instead of COPY in pgbench-tools so much now, basically whenever I'm talking to a cloud system, that I have a simple 0/1 config option to switch between the modes, and this long weird one is the default now. Try that out, and once you see the numbers my bet is you'll see extending which tables get COPY isn't needed by your use case anymore. Basically, if you are close enough to use COPY instead of server-side generation, you are close enough that every table besides accounts will not add up to enough time to worry about optimizing the little ones. -- Greg Smith greg.sm...@crunchydata.com Director of Open Source Strategy