Thanks for the insights..

Thanks,
Arun

On Wed, 3 Jan, 2024, 23:26 Jeremy Schneider, <schnei...@ardentperf.com>
wrote:

> On 1/2/24 11:23 PM, arun chirappurath wrote:
> > Do we have any open source tools which can be used to create sample data
> > at scale from our postgres databases?
> > Which considers data distribution and randomness
>
> I would suggest to use the most common tools whenever possible, because
> then if you want to discuss results with other people (for example on
> these mailing lists) then you're working with data sets that are widely
> and well understood.
>
> The most common tool for PostgreSQL is pgbench, which does a TPCB-like
> schema that you can scale to any size, always the same [small] number of
> tables/columns and same uniform data distribution, and there are
> relationships between tables so you can create FKs if needed.
>
> My second favorite tool is sysbench. Any number of tables, easily scale
> to any size, standardized schema with small number of colums and no
> relationships/FKs.  Data distribution is uniformly random however on the
> query side it supports a bunch of different distribution models, not
> just uniform random, as well as queries processing ranges of rows.
>
> The other tool that I'm intrigued by these days is benchbase from CMU.
> It can do TPCC and a bunch of other schemas/workloads, you can scale the
> data sizes. If you're just looking at data generation and you're going
> to make your own workloads, well benchbase has a lot of different
> schemas available out of the box.
>
> You can always hand-roll your schema and data with scripts & SQL, but
> the more complex and bespoke your performance test schema is, the more
> work & explaining it takes to get lots of people to engage in a
> discussion since they need to take time to understand how the test is
> engineered. For very narrowly targeted reproductions this is usually the
> right approach with a very simple schema and workload, but not commonly
> for general performance testing.
>
> -Jeremy
>
>
> --
> http://about.me/jeremy_schneider
>
>

Reply via email to