nevermind. The query plan was done on test data with 50 partitions. Sorry for the confusion.
On Mon, Feb 17, 2025 at 3:25 PM bruno vieira da silva <brunogi...@gmail.com> wrote: > Well, the query plans were generated with pg 17.3. and the buffer usage > was half. > did pg 17.3 had any fixes to reduce the planning buffer usage? > > On Mon, Feb 17, 2025 at 3:18 PM bruno vieira da silva < > brunogi...@gmail.com> wrote: > >> Hello, I did a more comprehensive test with a different number of >> partitions and I found this: >> >> Summary buffers usage for the first call vs second call on the same >> session. >> >> Query 200, 100, 50, and 10 partitions: >> 200 Partitions: 12,828 (100MB) >> 100 Partitions: 9,329 (72MB) >> 50 Partitions: 3,305 (25MB) >> 10 Partitions: 875 (7MB) >> >> Same query on the same session: >> 200 Partitions: 205 (1.6MB) >> 100 Partitions: 5 (40KB) >> 50 Partitions: 5 (40KB) >> 10 Partitions: 5 (40KB) >> >> I did test on PG 17.3 no relevant changes. >> >> Question is, does it make sense? >> >> *these are the steps to reproduce it:* >> >> docker pull postgres:17.2 >> docker run -itd -e POSTGRES_USER=bruno -e POSTGRES_PASSWORD=bruno -p >> 5500:5432 -v /home/bruno/pgdata17:/var/lib/postgresql/data --name >> postgresql postgres:17.2 >> export PGHOST="localhost" >> export PGPORT=5500 >> export PGDATABASE="postgres" >> export PGUSER="bruno" >> export PGPASSWORD="bruno" >> >> CREATE EXTENSION IF NOT EXISTS "pgcrypto"; -- Enables the gen_random_uuid >> function >> >> CREATE TABLE dicom_series ( >> series_uid UUID DEFAULT gen_random_uuid(), >> series_description VARCHAR(255), >> modality VARCHAR(16), >> body_part_examined VARCHAR(64), >> patient_id VARCHAR(64), >> study_uid UUID DEFAULT gen_random_uuid(), >> created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP >> ); >> >> -- Create the parent table >> CREATE TABLE dicom_sops_100_part ( >> sop_uid UUID NOT NULL, >> series_uid UUID NOT NULL, >> instance_number INT, >> image_position_patient TEXT, >> image_orientation_patient TEXT, >> slice_thickness DECIMAL(10, 2), >> slice_location DECIMAL(10, 2), >> pixel_spacing TEXT, >> rows INT, >> columns INT, >> acquisition_date DATE, >> acquisition_time TIME, >> created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP >> ) PARTITION BY HASH (sop_uid); >> >> -- Create 100 partitions >> DO $$ >> DECLARE >> partition_number INT; >> BEGIN >> FOR partition_number IN 0..99 LOOP >> EXECUTE format( >> 'CREATE TABLE dicom_sops_100_p%1$s PARTITION OF >> dicom_sops_100_part FOR VALUES WITH (MODULUS 100, REMAINDER %1$s);', >> partition_number >> ); >> END LOOP; >> END $$; >> >> *Data population:* >> >> DO $$ >> DECLARE >> series_count INT := 1000000; -- Number of series to create >> sops_per_series INT := 20; >> i INT; >> j INT; >> series_id UUID; >> sop_id UUID; >> BEGIN >> FOR i IN 1..series_count LOOP >> -- Insert into dicom_series table with a generated UUID >> INSERT INTO dicom_series ( >> series_description, >> modality, >> body_part_examined, >> patient_id >> ) VALUES ( >> 'Series Description ' || i, >> 'CT', >> 'Chest', >> 'PATIENT-' || i >> ) >> RETURNING series_uid INTO series_id; >> >> FOR j IN 1..sops_per_series LOOP >> -- Insert into dicom_sops_200_part table with a generated UUID >> INSERT INTO dicom_sops_100_part ( >> sop_uid, >> series_uid, >> instance_number, >> image_position_patient, >> image_orientation_patient, >> slice_thickness, >> slice_location, >> pixel_spacing, >> rows, >> columns, >> acquisition_date, >> acquisition_time >> ) VALUES ( >> gen_random_uuid(), >> series_id, >> j, >> '(0.0, 0.0, ' || j || ')', >> '(1.0, 0.0, 0.0, 0.0, 1.0, 0.0)', >> 1.0, >> j * 5.0, >> '1.0\\1.0', >> 512, >> 512, >> CURRENT_DATE, >> CURRENT_TIME >> ); >> END LOOP; >> END LOOP; >> END $$; >> >> *Add indexes and vacuum analyze:* >> >> CREATE UNIQUE INDEX idx_series_uid ON dicom_series(series_uid); >> CREATE INDEX dicom_sops_100_part_sop_uid_idx ON >> dicom_sops_100_part(sop_uid); >> CREATE INDEX dicom_sops_100_part_series_uid_idx ON >> dicom_sops_100_part(series_uid); >> >> vacuum freeze; >> analyze; >> >> *Testing:* >> disconnect and reconnect to the db with psql. >> >> Query used for test: >> >> drop table temp_series_id;CREATE TEMPORARY TABLE temp_series_id AS select >> series_uid from dicom_series order by random() limit 1; analyze >> temp_series_id; >> explain (analyze,buffers) select * from dicom_sops_100_part where >> series_uid = (select series_uid from temp_series_id); >> >> Query plan: >> >> >> QUERY PLAN >> >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> Append (cost=1.43..423.26 rows=50 width=128) (actual time=2.565..27.216 >> rows=20 loops=1) >> Buffers: shared hit=50 read=118, local hit=1 >> InitPlan 1 >> -> Seq Scan on temp_series_id (cost=0.00..1.01 rows=1 width=16) >> (actual time=0.006..0.007 rows=1 loops=1) >> Buffers: local hit=1 >> -> Index Scan using dicom_sops_100_p0_series_uid_idx on >> dicom_sops_100_p0 dicom_sops_100_part_1 (cost=0.42..8.44 rows=1 width=128) >> (actual time=0.846..0.846 rows=0 loops=1) >> Index Cond: (series_uid = (InitPlan 1).col1) >> .... >> -> Index Scan using dicom_sops_100_p49_series_uid_idx on >> dicom_sops_100_p49 dicom_sops_100_part_50 (cost=0.42..8.44 rows=1 >> width=128) (actual time=0.302..0.303 rows=0 loops=1) >> Index Cond: (series_uid = (InitPlan 1).col1) >> Buffers: shared hit=1 read=2 >> Planning: >> Buffers: shared hit=4180 >> Planning Time: 4.941 ms >> Execution Time: 27.682 ms >> (159 rows) >> >> Second query on the same session: >> >> QUERY PLAN >> >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> Append (cost=1.43..423.26 rows=50 width=128) (actual time=9.759..9.770 >> rows=0 loops=1) >> Buffers: shared hit=100 read=50, local hit=1 >> InitPlan 1 >> -> Seq Scan on temp_series_id (cost=0.00..1.01 rows=1 width=16) >> (actual time=0.003..0.004 rows=1 loops=1) >> Buffers: local hit=1 >> -> Index Scan using dicom_sops_100_p0_series_uid_idx on >> dicom_sops_100_p0 dicom_sops_100_part_1 (cost=0.42..8.44 rows=1 width=128) >> (actual time=0.212..0.213 rows=0 loops=1) >> Index Cond: (series_uid = (InitPlan 1).col1) >> ... >> -> Index Scan using dicom_sops_100_p49_series_uid_idx on >> dicom_sops_100_p49 dicom_sops_100_part_50 (cost=0.42..8.44 rows=1 >> width=128) (actual time=0.236..0.236 rows=0 loops=1) >> Index Cond: (series_uid = (InitPlan 1).col1) >> Buffers: shared hit=2 read=1 >> Planning: >> Buffers: shared hit=5 >> Planning Time: 0.604 ms >> Execution Time: 10.011 ms >> (159 rows) >> >> >> On Thu, Jan 16, 2025 at 9:56 AM bruno vieira da silva < >> brunogi...@gmail.com> wrote: >> >>> Hello, Thanks David. >>> >>> this pg test deployment. anyways I did a vacuum full on the db. and the >>> number of buffers read increased a bit. >>> >>> >>> On Wed, Jan 15, 2025 at 3:01 PM David Rowley <dgrowle...@gmail.com> >>> wrote: >>> >>>> On Thu, 16 Jan 2025 at 07:29, bruno vieira da silva >>>> <brunogi...@gmail.com> wrote: >>>> > On pg 17 now we have better visibility on the I/O required during >>>> query planning. >>>> > so, as part of an ongoing design work for table partitioning I was >>>> analyzing the performance implications of having more or less partitions. >>>> > In one of my tests of a table with 200 partitions using explain >>>> showed a large amount of buffers read during planning. around 12k buffers. >>>> >>>> That's a suspiciously high number of buffers. >>>> >>>> > I observed that query planning seems to have a caching mechanism as >>>> subsequent similar queries require only a fraction of buffers read during >>>> query planning. >>>> > However, this "caching" seems to be per session as if I end the >>>> client session and I reconnect the same query execution will require again >>>> to read 12k buffer for query planning. >>>> > >>>> > Does pg have any mechanism to mitigate this issue ( new sessions need >>>> to read a large amount of buffers for query planning) ? or should I >>>> mitigate this issue by the use of connection pooling. >>>> > How is this caching done? Is there a way to have viability on its >>>> usage? Where is it stored? >>>> >>>> The caching is for relation meta-data and for various catalogue data. >>>> This is stored in local session hash tables. The caching is done >>>> lazily the first time something is looked up after the session starts. >>>> If you're doing very little work before ending the session, then >>>> you'll pay this overhead much more often than you would if you were to >>>> do more work in each session. A connection pooler would help you do >>>> that, otherwise it would need to be a redesign of how you're >>>> connecting to Postgres from your application. >>>> >>>> There's no easy way from EXPLAIN to see which tables or catalogue >>>> tables the IO is occurring on, however, you might want to try looking >>>> at pg_statio_all_tables directly before and after the query that's >>>> causing the 12k buffer accesses and then look at what's changed. >>>> >>>> I suspect if you're accessing 12k buffers to run EXPLAIN that you have >>>> some auto-vacuum starvation issues. Is auto-vacuum enabled and >>>> running? If you look at pg_stat_activity, do you see autovacuum >>>> running? It's possible that it's running but not configured to run >>>> quickly enough to keep up with demand. Alternatively, it may be >>>> keeping up now, but at some point in the past, it might not have been >>>> and you have some bloat either in an index or in a catalogue table as >>>> a result. >>>> >>>> David >>>> >>> >>> >>> -- >>> Bruno Vieira da Silva >>> >> >> >> -- >> Bruno Vieira da Silva >> > > > -- > Bruno Vieira da Silva > -- Bruno Vieira da Silva