Thanks for the insights.. Thanks, Arun
On Wed, 3 Jan, 2024, 23:26 Jeremy Schneider, <schnei...@ardentperf.com> wrote: > On 1/2/24 11:23 PM, arun chirappurath wrote: > > Do we have any open source tools which can be used to create sample data > > at scale from our postgres databases? > > Which considers data distribution and randomness > > I would suggest to use the most common tools whenever possible, because > then if you want to discuss results with other people (for example on > these mailing lists) then you're working with data sets that are widely > and well understood. > > The most common tool for PostgreSQL is pgbench, which does a TPCB-like > schema that you can scale to any size, always the same [small] number of > tables/columns and same uniform data distribution, and there are > relationships between tables so you can create FKs if needed. > > My second favorite tool is sysbench. Any number of tables, easily scale > to any size, standardized schema with small number of colums and no > relationships/FKs. Data distribution is uniformly random however on the > query side it supports a bunch of different distribution models, not > just uniform random, as well as queries processing ranges of rows. > > The other tool that I'm intrigued by these days is benchbase from CMU. > It can do TPCC and a bunch of other schemas/workloads, you can scale the > data sizes. If you're just looking at data generation and you're going > to make your own workloads, well benchbase has a lot of different > schemas available out of the box. > > You can always hand-roll your schema and data with scripts & SQL, but > the more complex and bespoke your performance test schema is, the more > work & explaining it takes to get lots of people to engage in a > discussion since they need to take time to understand how the test is > engineered. For very narrowly targeted reproductions this is usually the > right approach with a very simple schema and workload, but not commonly > for general performance testing. > > -Jeremy > > > -- > http://about.me/jeremy_schneider > >