Re: [PERFORM] Query planner wants to use seq scan
On 27.10.2015 12:35, Bertrand Paquet wrote: Hi all, We have a slow query. After analyzing, the planner decision seems to be discutable : the query is faster when disabling seqscan. See below the two query plan, and an extract from pg_stats. Any idea about what to change to help the planner ? An information which can be useful : the number on distinct value on organization_id is very very low, may be the planner does not known that, and take the wrong decision. Regards, Bertrand # explain analyze SELECT 1 AS one FROM "external_sync_messages" WHERE "external_sync_messages"."organization_id" = 1612 AND ("external_sync_messages"."status" NOT IN ('sent_to_proxy', 'in_progress', 'ok')) AND "external_sync_messages"."handled_by" IS NULL LIMIT 1; QUERY PLAN Limit (cost=0.00..12.39 rows=1 width=0) (actual time=232.212..232.213 rows=1 loops=1) -> Seq Scan on external_sync_messages (cost=0.00..79104.69 rows=6385 width=0) (actual time=232.209..232.209 rows=1 loops=1) Filter: ((handled_by IS NULL) AND (organization_id = 1612) AND ((status)::text <> ALL ('{sent_to_proxy,in_progress,ok}'::text[]))) Rows Removed by Filter: 600140 Planning time: 0.490 ms Execution time: 232.246 ms (6 rows) # set enable_seqscan = off; SET # explain analyze SELECT 1 AS one FROM "external_sync_messages" WHERE "external_sync_messages"."organization_id" = 1612 AND ("external_sync_messages"."status" NOT IN ('sent_to_proxy', 'in_progress', 'ok')) AND "external_sync_messages"."handled_by" IS NULL LIMIT 1; QUERY PLAN -- Limit (cost=0.42..39.88 rows=1 width=0) (actual time=0.030..0.030 rows=1 loops=1) -> Index Scan using index_external_sync_messages_on_organization_id on external_sync_messages (cost=0.42..251934.05 rows=6385 width=0) (actual time=0.028..0.028 rows=1 loops=1) Index Cond: (organization_id = 1612) Filter: ((handled_by IS NULL) AND ((status)::text <> ALL ('{sent_to_proxy,in_progress,ok}'::text[]))) Planning time: 0.103 ms Execution time: 0.052 ms (6 rows) # SELECT attname, inherited, n_distinct, array_to_string(most_common_vals, E'\n') as most_common_vals FROM pg_stats WHERE tablename = 'external_sync_messages' and attname IN ('status', 'organization_id', 'handled_by'); attname | inherited | n_distinct | most_common_vals -+---++-- handled_by | f | 3 | 3 + | | | 236140 + | | | 54413 organization_id | f | 22 | 1612+ | | | 287 + | | | 967 + | | | 1223+ | | | 1123+ | | | 1930+ | | | 841 + | | | 1814+ | | | 711 + | | | 1513+ | | | 1794+ | | | 1246+ | | | 1673+ | | | 1552+ | | | 1747+ | | | 2611+ | | | 2217+ | | | 2448+ | | | 2133+ | | | 1861+ | | | 2616+ | | | 2796 status | f | 6 | ok + | | | ignored + | | | channel_error + | | | in_progress + | | | error + | | | sent_to_proxy (3 rows) # select count(*) from external_sync_mes
Re: [PERFORM] Query planner wants to use seq scan
On 27.10.2015 14:10, Bertrand Paquet wrote: Yes, I have run VACUUM ANALYZE, no effect. Bertrand 2015-10-27 12:08 GMT+01:00 Alex Ignatov <mailto:a.igna...@postgrespro.ru>>: On 27.10.2015 12:35, Bertrand Paquet wrote: Hi all, We have a slow query. After analyzing, the planner decision seems to be discutable : the query is faster when disabling seqscan. See below the two query plan, and an extract from pg_stats. Any idea about what to change to help the planner ? An information which can be useful : the number on distinct value on organization_id is very very low, may be the planner does not known that, and take the wrong decision. Regards, Bertrand # explain analyze SELECT 1 AS one FROM "external_sync_messages" WHERE "external_sync_messages"."organization_id" = 1612 AND ("external_sync_messages"."status" NOT IN ('sent_to_proxy', 'in_progress', 'ok')) AND "external_sync_messages"."handled_by" IS NULL LIMIT 1; QUERY PLAN Limit (cost=0.00..12.39 rows=1 width=0) (actual time=232.212..232.213 rows=1 loops=1) -> Seq Scan on external_sync_messages (cost=0.00..79104.69 rows=6385 width=0) (actual time=232.209..232.209 rows=1 loops=1) Filter: ((handled_by IS NULL) AND (organization_id = 1612) AND ((status)::text <> ALL ('{sent_to_proxy,in_progress,ok}'::text[]))) Rows Removed by Filter: 600140 Planning time: 0.490 ms Execution time: 232.246 ms (6 rows) # set enable_seqscan = off; SET # explain analyze SELECT 1 AS one FROM "external_sync_messages" WHERE "external_sync_messages"."organization_id" = 1612 AND ("external_sync_messages"."status" NOT IN ('sent_to_proxy', 'in_progress', 'ok')) AND "external_sync_messages"."handled_by" IS NULL LIMIT 1; QUERY PLAN -- Limit (cost=0.42..39.88 rows=1 width=0) (actual time=0.030..0.030 rows=1 loops=1) -> Index Scan using index_external_sync_messages_on_organization_id on external_sync_messages (cost=0.42..251934.05 rows=6385 width=0) (actual time=0.028..0.028 rows=1 loops=1) Index Cond: (organization_id = 1612) Filter: ((handled_by IS NULL) AND ((status)::text <> ALL ('{sent_to_proxy,in_progress,ok}'::text[]))) Planning time: 0.103 ms Execution time: 0.052 ms (6 rows) # SELECT attname, inherited, n_distinct, array_to_string(most_common_vals, E'\n') as most_common_vals FROM pg_stats WHERE tablename = 'external_sync_messages' and attname IN ('status', 'organization_id', 'handled_by'); attname | inherited | n_distinct | most_common_vals -+---++-- handled_by | f | 3 | 3+ | | | 236140 + | | | 54413 organization_id | f | 22 | 1612 + | | | 287+ | | | 967+ | | | 1223 + | | | 1123 + | | | 1930 + | | | 841+ | | | 1814 + | | | 711+ | | | 1513 + | | | 1794 + | | | 1246 + | | | 1673 + | | | 1552 + | | | 1747 + | | | 2611 + | | | 2217 + | | | 2448 + | | |
Re: [PERFORM] Query planner wants to use seq scan
On 27.10.2015 14:19, Bertrand Paquet wrote: relname | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum| last_analyze | last_autoanalyze +++---+---+---+--- external_sync_messages | 998105 | 11750 | 2015-10-26 20:15:17.484771+00 | 2015-10-02 15:04:25.944479+00 | 2015-10-26 20:15:19.465308+00 | 2015-10-22 12:24:26.947616+00 (1 row) 2015-10-27 12:17 GMT+01:00 Alex Ignatov <mailto:a.igna...@postgrespro.ru>>: On 27.10.2015 14:10, Bertrand Paquet wrote: Yes, I have run VACUUM ANALYZE, no effect. Bertrand 2015-10-27 12:08 GMT+01:00 Alex Ignatov mailto:a.igna...@postgrespro.ru>>: On 27.10.2015 12:35, Bertrand Paquet wrote: Hi all, We have a slow query. After analyzing, the planner decision seems to be discutable : the query is faster when disabling seqscan. See below the two query plan, and an extract from pg_stats. Any idea about what to change to help the planner ? An information which can be useful : the number on distinct value on organization_id is very very low, may be the planner does not known that, and take the wrong decision. Regards, Bertrand # explain analyze SELECT 1 AS one FROM "external_sync_messages" WHERE "external_sync_messages"."organization_id" = 1612 AND ("external_sync_messages"."status" NOT IN ('sent_to_proxy', 'in_progress', 'ok')) AND "external_sync_messages"."handled_by" IS NULL LIMIT 1; QUERY PLAN Limit (cost=0.00..12.39 rows=1 width=0) (actual time=232.212..232.213 rows=1 loops=1) -> Seq Scan on external_sync_messages (cost=0.00..79104.69 rows=6385 width=0) (actual time=232.209..232.209 rows=1 loops=1) Filter: ((handled_by IS NULL) AND (organization_id = 1612) AND ((status)::text <> ALL ('{sent_to_proxy,in_progress,ok}'::text[]))) Rows Removed by Filter: 600140 Planning time: 0.490 ms Execution time: 232.246 ms (6 rows) # set enable_seqscan = off; SET # explain analyze SELECT 1 AS one FROM "external_sync_messages" WHERE "external_sync_messages"."organization_id" = 1612 AND ("external_sync_messages"."status" NOT IN ('sent_to_proxy', 'in_progress', 'ok')) AND "external_sync_messages"."handled_by" IS NULL LIMIT 1; QUERY PLAN -- Limit (cost=0.42..39.88 rows=1 width=0) (actual time=0.030..0.030 rows=1 loops=1) -> Index Scan using index_external_sync_messages_on_organization_id on external_sync_messages (cost=0.42..251934.05 rows=6385 width=0) (actual time=0.028..0.028 rows=1 loops=1) Index Cond: (organization_id = 1612) Filter: ((handled_by IS NULL) AND ((status)::text <> ALL ('{sent_to_proxy,in_progress,ok}'::text[]))) Planning time: 0.103 ms Execution time: 0.052 ms (6 rows) # SELECT attname, inherited, n_distinct, array_to_string(most_common_vals, E'\n') as most_common_vals FROM pg_stats WHERE tablename = 'external_sync_messages' and attname IN ('status', 'organization_id', 'handled_by'); attname | inherited | n_distinct | most_common_vals -+---++-- handled_by | f | 3 | 3 + | | | 236140 + | | | 54413 organization_id | f | 22 | 1612+ | |
Re: [PERFORM] Query planner wants to use seq scan
On 27.10.2015 23:56, Bertrand Paquet wrote: So, Tonight, the index on the three field is used, may be my yesterday vacuum updated stats. Thx you for your help. Regards, Bertrand 2015-10-27 18:33 GMT+01:00 Bertrand Paquet mailto:bertrand.paq...@doctolib.fr>>: Hi tom, I did the test yesterday with an index on the three fields, and with a partial index on organization and status and where is null condition on handled. I saw no modification on query plan. May be I forgot to analyze vacuum after. I will retry tonight. I use a btree index. Is it the good solution, even with the In clause ? Regards, Bertrand Le mardi 27 octobre 2015, Tom Lane mailto:t...@sss.pgh.pa.us>> a écrit : Bertrand Paquet writes: > We have a slow query. After analyzing, the planner decision seems to be > discutable : the query is faster when disabling seqscan. See below the two > query plan, and an extract from pg_stats. > Any idea about what to change to help the planner ? Neither one of those plans is very good: you're just hoping that the Filter condition will let a tuple through sooner rather than later. If you care about the performance of this type of query, I'd consider creating an index on (organization_id, status, handled_by) so that all the conditions can be checked in the index. regards, tom lane Hello Bertrand once again! What's your status? Does the plan changed after deploying three field index ? -- Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PERFORM] Scalability to more than 64 cores With PG 9.4 and RHEL 7.1 Kernel 3.10
On 28.10.2015 8:04, Javier Muro wrote: Has PostgreSQL 9.4 official Support on a system with more than 64 legitimate cores? ( 72 Cores , 4 CPUs Intel(R) Xeon(R) CPU E7-8890 ) The work Robert Haas did to fix the CPU locking way back when showed significant improvements up to 64, but so far as I know. Thanks in Advance. Hello Javier! Our tests shows that PG 9.4 scales well up to 60 Intel cores. I.E pgbech -S and DB on tmpfs gave us 700 000 tps. After 60 соres s_lock is dominating in cpu usage%. 9.5 scales way better. -- Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query that took a lot of time in Postgresql when not using trim in order by
n on public.dim_cliente (cost=0.00..618.90 rows=16890 width=86) (actual time=0.008..16.537 rows=16890 loops=1) Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente -> Hash (cost=18.90..18.90 rows=590 width=59) (actual time=0.747..0.747 rows=590 loops=1) Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor Buckets: 1024 Batches: 1 Memory Usage: 56kB -> Seq Scan on public.dim_vendedor (cost=0.00..18.90 rows=590 width=59) (actual time=0.026..0.423 rows=590 loops=1) Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor Total runtime: 3845.895 ms (25 filas) ___ Query without trim SELECT "dim_cliente"."tipocliente" AS "c0", "dim_cliente"."a1_ibge" AS "c1", "dim_cliente"."a1_cod" AS "c2", "dim_cliente"."a1_nome" AS "c3", "dim_vendedor"."a3_nome" AS "c4" FROM "public"."dim_cliente" AS "dim_cliente", "public"."fato_ventas_productos" AS "fato_ventas_productos", "public"."dim_vendedor" AS "dim_vendedor" WHERE "fato_ventas_productos"."key_cliente" = "dim_cliente"."key_cliente" AND "fato_ventas_productos"."key_vendedor" = "dim_vendedor"."key_vendedor" GROUP BY "dim_cliente"."tipocliente" , "dim_cliente"."a1_ibge", "dim_cliente"."a1_cod", "dim_cliente"."a1_nome", "dim_vendedor"."a3_nome" ORDER BY "dim_cliente"."tipocliente" ASC NULLS LAST, "dim_cliente"."a1_ibge" ASC NULLS LAST, "dim_cliente"."a1_cod" ASC NULLS LAST, "dim_cliente"."a1_nome" ASC NULLS LAST; -- this query took 37249.268 ms ___ Query Plan when not using trim QUERY PLAN --- Group (cost=170417.48..184027.84 rows=907357 width=129) (actual time=36649.329..37235.158 rows=43615 loops=1) Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome -> Sort (cost=170417.48..172685.88 rows=907357 width=129) (actual time=36649.315..36786.760 rows=907357 loops=1) Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome Sort Key: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome Sort Method: quicksort Memory: 265592kB -> Hash Join (cost=856.30..80628.56 rows=907357 width=129) (actual time=26.719..1593.693 rows=907357 loops=1) Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome Hash Cond: (fato_ventas_productos.key_vendedor = dim_vendedor.key_vendedor) -> Hash Join (cost=830.02..68126.13 rows=907357 width=86) (actual time=25.980..1203.775 rows=907357 loops=1) Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, fato_ventas_productos.key_vendedor Hash Cond: (fato_ventas_productos.key_cliente = dim_cliente.key_cliente) -> Seq Scan on public.fato_ventas_productos (cost=0.00..46880.57 rows=907357 width=16) (actual time=0.004..680.283 rows=907357 loops=1) Output: fato_ventas_productos.key_cliente, fato_ventas_productos.key_vendedor -> Hash (cost=618.90..618.90 rows=16890 width=86) (actual time=25.931..25.931 rows=16890 loops=1) Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente Buckets: 2048 Batches: 1 Memory Usage: 1980kB -> Seq Scan on public.dim_cliente (cost=0.00..618.90 rows=16890 width=86) (actual time=0.005..13.736 rows=16890 loops=1) Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente -> Hash (cost=18.90..18.90 rows=590 width=59) (actual time=0.715..0.715 rows=590 loops=1) Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor Buckets: 1024 Batches: 1 Memory Usage: 56kB -> Seq Scan on public.dim_vendedor (cost=0.00..18.90 rows=590 width=59) (actual time=0.024..0.405 rows=590 loops=1) Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor Total runtime: 37249.268 ms (25 filas) ___ Is anything that I can do to solve this problem, is that a bug or a config problem? Here the link with a dump of the tables https://drive.google.com/file/d/0Bwupj61i9BtWZ1NiVXltaWc0dnM/view?usp=sharing I appreciate your help Hello! What is your Postgres version? Do you have correct statistics on this tables? Please show yours execution plans with buffers i.e. explain (analyze,buffers) ... -- Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PERFORM] Many-to-many performance problem
involving the array column, but that doesn't appear to be very useful for these ordered queries because it's not an ordered index. * I tried adding created_on to msgs_message_labels table but I couldn't find a way of avoiding the in-memory sort. * Have thought about dynamically creating partial indexes for each label using an array column on msgs_message to hold label ids, and index condition like WHERE label_ids && ARRAY[123] but not sure what other problems I'll run into with hundreds of indexes on the same table. Server is an Amazon RDS instance with default settings and Postgres 9.3.10, with one other database in the instance. All advice very much appreciated, thanks -- *Rowan Seymour* | +260 964153686 Hello! What do you mean by "Server is an Amazon RDS instance with default settings and Postgres 9.3.10, with one other database in the instance." PG is with default config or smth else? Is it with default config as it is as from compile version? If so you should definitely have to do some tuning on it. By looking on plan i saw a lot of disk read. It can be linked to small shared memory dedicated to PG exactly what Tom said. Can you share pg config or raise for example shared_buffers parameter? Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PERFORM] Millions of tables
From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Greg Spiegelberg Sent: Tuesday, September 27, 2016 7:28 PM To: Terry Schmitt Cc: pgsql-performa. Subject: Re: [PERFORM] Millions of tables On Tue, Sep 27, 2016 at 10:15 AM, Terry Schmitt mailto:tschm...@schmittworks.com> > wrote: On Sun, Sep 25, 2016 at 7:50 PM, Greg Spiegelberg mailto:gspiegelb...@gmail.com> > wrote: Hey all, Obviously everyone who's been in PostgreSQL or almost any RDBMS for a time has said not to have millions of tables. I too have long believed it until recently. AWS d2.8xlarge instance with 9.5 is my test rig using XFS on EBS (io1) for PGDATA. Over the weekend, I created 8M tables with 16M indexes on those tables. Table creation initially took 0.018031 secs, average 0.027467 and after tossing out outliers (qty 5) the maximum creation time found was 0.66139 seconds. Total time 30 hours, 31 minutes and 8.435049 seconds. Tables were created by a single process. Do note that table creation is done via plpgsql function as there are other housekeeping tasks necessary though minimal. No system tuning but here is a list of PostgreSQL knobs and switches: shared_buffers = 2GB work_mem = 48 MB max_stack_depth = 4 MB synchronous_commit = off effective_cache_size = 200 GB pg_xlog is on it's own file system There are some still obvious problems. General DBA functions such as VACUUM and ANALYZE should not be done. Each will run forever and cause much grief. Backups are problematic in the traditional pg_dump and PITR space. Large JOIN's by VIEW, SELECT or via table inheritance (I am abusing it in my test case) are no-no's. A system or database crash could take potentially hours to days to recover. There are likely other issues ahead. You may wonder, "why is Greg attempting such a thing?" I looked at DynamoDB, BigTable, and Cassandra. I like Greenplum but, let's face it, it's antiquated and don't get me started on "Hadoop". I looked at many others and ultimately the recommended use of each vendor was to have one table for all data. That overcomes the millions of tables problem, right? Problem with the "one big table" solution is I anticipate 1,200 trillion records. Random access is expected and the customer expects <30ms reads for a single record fetch. No data is loaded... yet Table and index creation only. I am interested in the opinions of all including tests I may perform. If you had this setup, what would you capture / analyze? I have a job running preparing data. I did this on a much smaller scale (50k tables) and data load via function allowed close to 6,000 records/second. The schema has been simplified since and last test reach just over 20,000 records/second with 300k tables. I'm not looking for alternatives yet but input to my test. Takers? I can't promise immediate feedback but will do my best to respond with results. TIA, -Greg I have not seen any mention of transaction ID wraparound mentioned in this thread yet. With the numbers that you are looking at, I could see this as a major issue. T Thank you Terry. You get the gold star. :) I was waiting for that to come up. Success means handling this condition. A whole database vacuum and dump-restore is out of the question. Can a properly tuned autovacuum prevent the situation? -Greg Hi! With millions of tables you have to setautovacuum_max_workers sky-high =). We have some situation when at thousands of tables autovacuum can’t vacuum all tables that need it. Simply it vacuums some of most modified table and never reach others. Only manual vacuum can help with this situation. With wraparound issue it can be a nightmare -- Alex Ignatov Postgres Professional: <http://www.postgrespro.com> http://www.postgrespro.com The Russian Postgres Company