po 26. 2. 2024 v 8:08 odesílatel Emiel Mols <em...@crisp.nl> napsal:
> Thanks, as indicated we're using that right now. The 30% spinlock overhead > unfortunately persists. > try to increase shared_buffer 128MB can be too low max_connection = 2048 - it unrealistic high > - Fsync was already disabled, too. Complete postgresql.conf used in > testing: > listen_addresses = '' > max_connections = 2048 > unix_socket_directories = '..' > shared_buffers = 128MB > log_line_prefix = '' > synchronous_commit = 'off' > wal_level = 'minimal' > > - linux perf report comparing schema-per-test vs database-per-test: > https://ibb.co/CW5w2MW > > - Emiel > > > On Mon, Feb 26, 2024 at 1:36 PM Pavel Stehule <pavel.steh...@gmail.com> > wrote: > >> Hi >> >> po 26. 2. 2024 v 7:28 odesílatel Emiel Mols <em...@crisp.nl> napsal: >> >>> Hello, >>> >>> To improve our unit and end-to-end testing performance, we are looking >>> to optimize initialization of around 500-1000 database *schemas* from a >>> schema.sql file. >>> >>> Background: in postgres, you cannot change databases on >>> existing connections, and using schemas would allow us to minimize the >>> amount of active connections needed from our test runners, as we can reuse >>> connections by updating search_path. In our current database-per-test setup >>> we see that there is around a ~30% (!!) total CPU overhead in >>> native_queued_spin_lock_slowpath (as profiled with linux perf), presumably >>> because of the high connection count. We run ~200 tests in parallel to >>> saturate a 128 core machine. >>> >>> In the schema-per-test setup, however, it becomes harder to cheaply >>> create the schema. Before we could `create database test001 with template >>> testbase` to set up the database for a test, and this was reasonably fast. >>> Re-inserting a schema with ~500 table/index definitions across 500 test >>> schema's is prohibitively expensive (around 1000ms per schema insertion >>> means we're wasting 500 cpu-seconds, and there seems to be quite some lock >>> contention too). Linux perf shows that in this setup we're reducing the >>> native_queued_spin_lock_slowpath overhead to around 10%, but total test >>> time is still slower due to all schema initialization being done. Also it >>> feels quite elegant functions and types can be reused between tests. >>> >>> Does anyone have some insights or great ideas :)? Also pretty curious to >>> the fundamental reason why having high/changing connection counts to >>> postgres results in this much (spin)lock contention (or perhaps we're doing >>> something wrong in either our configuration or measurements?). >>> >>> An alternative we haven't explored yet is to see if we can use pgbouncer >>> or other connection pooler to mitigate the 30% issue (set limits so there >>> are only ever X connections to postgres in total, and perhaps max Y per >>> test/database). This does add another piece of infrastructure/complexity, >>> so not really prefered. >>> >> >> For testing >> >> a) use templates - CREATE DATABASE test TEMPLATE some; >> >> b) disable fsync (only for testing!!!) >> >> Regards >> >> Pavel >> >> >>> Best, >>> >>> Emiel >>> >>