Hello hackers,

For some of my specific hardware tests I needed to generate big databases well 
beyond RAM size. Hence I turned to pgbench tool and its default 2 modes for 
client- and server-side generation for TPC-B tests. When I use "scale" factor 
in range of few thousands (eg, 3000 - 5000) data generation phase takes quite 
some time. I looked at it as opportunity to prove/disprove 2 hypothesises:

  *
will INSERT mode work faster if we commit once every "scale" and turn single 
INSERT into "for" loop with commits for 3 tables in the end of each loop
  *
will "INSERT .. SELECT FROM unnest" be faster than "INSERT .. SELECT FROM 
generate_series"
  *
will BINARY mode work faster than TEXT even though we send much more data
  *
and so on

As a result of my experiments I produced significant patch for pgbench utility 
and though that it might be of interest not just for me. Therefore I'm sending 
draft version of it in diff format for current development tree on GitHub. As 
of November 11, 2025 I can merge with main branch of the project on GitHub.

Spoiler alert: "COPY FROM BINARY" is significantly faster than current "COPY 
FROM TEXT"

Would be happy to polish it if there is interest to such change.

Cheers

Attachment: pgbench.c.diff
Description: pgbench.c.diff

Reply via email to