Query is slow when run for first time; subsequent execution is fast
Hello, This is my first question in postgres mailing list. If there are any mistakes, please don't mind. I am using PostgreSQL 9.4.4 on a Mac machine executing queries on postgres server through the psql client. servicedesk=# select version(); version PostgreSQL 9.4.4 on x86_64-apple-darwin, compiled by i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00), 64-bit (1 row) Repeatedly, I came across instances where any query when run for the first time takes longer time to execute (nearly 2 second sometimes), but subsequent execution of the same query is very fast (less than 20 milliseconds). The tables involved in the query also have very less number of rows (less than 50). On running explain (analyze, buffers) got the following results. -- start -- servicedesk=# servicedesk=# explain (analyze, buffers, verbose) SELECT COUNT(*) FROM ChangeDetails LEFT JOIN SDOrganization AaaOrg ON ChangeDetails.SITEID=AaaOrg.ORG_ID LEFT JOIN ApprovalStatusDefinition ON ChangeDetails.APPR_STATUSID=ApprovalStatusDefinition.STATUSID LEFT JOIN CategoryDefinition ON ChangeDetails.CATEGORYID=CategoryDefinition.CATEGORYID LEFT JOIN Change_Fields ON ChangeDetails.CHANGEID=Change_Fields.CHANGEID LEFT JOIN Change_StageDefinition ON ChangeDetails.WFSTAGEID=Change_StageDefinition.WFSTAGEID LEFT JOIN Change_StatusDefinition ON ChangeDetails.WFSTATUSID=Change_StatusDefinition.WFSTATUSID LEFT JOIN AaaUser ChangeManager ON ChangeDetails.CHANGEMANAGERID=ChangeManager.USER_ID LEFT JOIN AaaUser ChangeOriginator ON ChangeDetails.INITIATORID=ChangeOriginator.USER_ID LEFT JOIN AaaUser ChangeOwner ON ChangeDetails.TECHNICIANID=ChangeOwner.USER_ID LEFT JOIN ChangeResolution ON ChangeDetails.CHANGEID=ChangeResolution.CHANGEID LEFT JOIN ChangeTemplate ON ChangeDetails.TEMPLATEID=ChangeTemplate.TEMPLATEID LEFT JOIN ChangeToClosureCode ON ChangeDetails.CHANGEID=ChangeToClosureCode.CHANGEID LEFT JOIN Change_ClosureCode ON ChangeToClosureCode.ID=Change_ClosureCode.ID LEFT JOIN ChangeTypeDefinition ON ChangeDetails.CHANGETYPEID=ChangeTypeDefinition.CHANGETYPEID LEFT JOIN ChangeWF_Definition ON ChangeDetails.WFID=ChangeWF_Definition.ID LEFT JOIN ImpactDefinition ON ChangeDetails.IMPACTID=ImpactDefinition.IMPACTID LEFT JOIN ItemDefinition ON ChangeDetails.ITEMID=ItemDefinition.ITEMID LEFT JOIN PriorityDefinition ON ChangeDetails.PRIORITYID=PriorityDefinition.PRIORITYID LEFT JOIN QueueDefinition ON ChangeDetails.GROUPID=QueueDefinition.QUEUEID LEFT JOIN RiskDefinition ON ChangeDetails.RISKID=RiskDefinition.RISKID LEFT JOIN StageDefinition ON ChangeDetails.STAGEID=StageDefinition.STAGEID LEFT JOIN SubCategoryDefinition ON ChangeDetails.SUBCATEGORYID=SubCategoryDefinition.SUBCATEGORYID LEFT JOIN UrgencyDefinition ON ChangeDetails.URGENCYID=UrgencyDefinition.URGENCYID LEFT JOIN SDUser ON ChangeDetails.INITIATORID=SDUser.USERID; QUERY PLAN -- Aggregate (cost=13.25..13.26 rows=1 width=160) (actual time=0.018..0.018 rows=1 loops=1) Output: count(*) Buffers: shared hit=1 -> Seq Scan on public.changedetails (cost=0.00..12.60 rows=260 width=160) (actual time=0.007..0.008 rows=2 loops=1) Output: changedetails.changeid, changedetails.initiatorid, changedetails.technicianid, changedetails.stageid, changedetails.priorityid, changedetails.categoryid, changedetails.subcategoryid, changedetails.itemid, changedetails.appr_statusid, changedetails.changetypeid, changedetails.urgencyid, changedetails.title, changedetails.description, changedetails.createdtime, changedetails.scheduledstarttime, changedetails.scheduledendtime, changedetails.completedtime, changedetails.notespresent, changedetails.siteid, changedetails.groupid, changedetails.templateid, changedetails.wfid, changedetails.wfstageid, changedetails.wfstatusid, changedetails.isemergency, changedetails.isretrospective, changedetails.reason
Re: Query is slow when run for first time; subsequent execution is fast
Hello Jeff, Thanks for the insights. >Don't keep closing and reopening connections. Even if I close a connection and open a new one and execute the same query, the planning time is considerably less than the first time. Only when I restart the Postgres server then I face high planning time again. >The query plan itself is not cached, but all the metadata about the (large number) of tables used in the query is cached. Apparently reading/parsing that data is the slow step, not coming up with the actual plan. I enabled logging for parser, planner etc in postgresql.conf and re run the queries. Following is the logs - I am not sure exactly how this should be read, but the major difference in elapsed time seems to be in PLANNER STATISTICS section. -- start -- 1. First run LOG: PARSER STATISTICS DETAIL: ! system usage stats: ! 0.000482 elapsed 0.000356 user 0.000127 system sec ! [0.004921 user 0.004824 sys total] ! 0/0 [0/1] filesystem blocks in/out ! 0/102 [0/1076] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [3/5] messages rcvd/sent ! 0/0 [8/11] voluntary/involuntary context switches STATEMENT: SELECT COUNT(*) FROM ChangeDetails LEFT JOIN SDOrganization AaaOrg ON ChangeDetails.SITEID=AaaOrg.ORG_ID LEFT JOIN ApprovalStatusDefinition ON ChangeDetails.APPR_STATUSID=ApprovalStatusDefinition.STATUSID LEFT JOIN CategoryDefinition ON ChangeDetails.CATEGORYID=CategoryDefinition.CATEGORYID LEFT JOIN Change_Fields ON ChangeDetails.CHANGEID=Change_Fields.CHANGEID LEFT JOIN Change_StageDefinition ON ChangeDetails.WFSTAGEID=Change_StageDefinition.WFSTAGEID LEFT JOIN Change_StatusDefinition ON ChangeDetails.WFSTATUSID=Change_StatusDefinition.WFSTATUSID LEFT JOIN AaaUser ChangeManager ON ChangeDetails.CHANGEMANAGERID=ChangeManager.USER_ID LEFT JOIN AaaUser ChangeOriginator ON ChangeDetails.INITIATORID=ChangeOriginator.USER_ID LEFT JOIN AaaUser ChangeOwner ON ChangeDetails.TECHNICIANID=ChangeOwner.USER_ID LEFT JOIN ChangeResolution ON ChangeDetails.CHANGEID=ChangeResolution.CHANGEID LEFT JOIN ChangeTemplate ON ChangeDetails.TEMPLATEID=ChangeTemplate.TEMPLATEID LEFT JOIN ChangeToClosureCode ON ChangeDetails.CHANGEID=ChangeToClosureCode.CHANGEID LEFT JOIN Change_ClosureCode ON ChangeToClosureCode.ID=Change_ClosureCode.ID LEFT JOIN ChangeTypeDefinition ON ChangeDetails.CHANGETYPEID=ChangeTypeDefinition.CHANGETYPEID LEFT JOIN ChangeWF_Definition ON ChangeDetails.WFID=ChangeWF_Definition.ID LEFT JOIN ImpactDefinition ON ChangeDetails.IMPACTID=ImpactDefinition.IMPACTID LEFT JOIN ItemDefinition ON ChangeDetails.ITEMID=ItemDefinition.ITEMID LEFT JOIN PriorityDefinition ON ChangeDetails.PRIORITYID=PriorityDefinition.PRIORITYID LEFT JOIN QueueDefinition ON ChangeDetails.GROUPID=QueueDefinition.QUEUEID LEFT JOIN RiskDefinition ON ChangeDetails.RISKID=RiskDefinition.RISKID LEFT JOIN StageDefinition ON ChangeDetails.STAGEID=StageDefinition.STAGEID LEFT JOIN SubCategoryDefinition ON ChangeDetails.SUBCATEGORYID=SubCategoryDefinition.SUBCATEGORYID LEFT JOIN UrgencyDefinition ON ChangeDetails.URGENCYID=UrgencyDefinition.URGENCYID LEFT JOIN SDUser ON ChangeDetails.INITIATORID=SDUser.USERID; LOG: statement: SELECT COUNT(*) FROM ChangeDetails LEFT JOIN SDOrganization AaaOrg ON ChangeDetails.SITEID=AaaOrg.ORG_ID LEFT JOIN ApprovalStatusDefinition ON ChangeDetails.APPR_STATUSID=ApprovalStatusDefinition.STATUSID LEFT JOIN CategoryDefinition ON ChangeDetails.CATEGORYID=CategoryDefinition.CATEGORYID LEFT JOIN Change_Fields ON ChangeDetails.CHANGEID=Change_Fields.CHANGEID LEFT JOIN Change_StageDefinition ON ChangeDetails.WFSTAGEID=Change_StageDefinition.WFSTAGEID LEFT JOIN Change_StatusDefinition ON ChangeDetails.WFSTATUSID=Change_StatusDefinition.WFSTATUSID LEFT JOIN AaaUser ChangeManager ON ChangeDetails.CHANGEMANAGERID=ChangeManager.USER_ID LEFT JOIN AaaUser ChangeOriginator ON ChangeDetails.INITIATORID=ChangeOriginator.USER_ID LEFT JOIN AaaUser ChangeOwner ON ChangeDetails.TECHNICIANID=ChangeOwner.USER_ID LEFT JOIN ChangeResolution ON ChangeDetails.CHANGEID=ChangeResolution.CHANGEID LEFT JOIN ChangeTemplate ON ChangeDetails.TEMPLATEID=ChangeTemplate.TEMPLATEID LEFT JOIN ChangeToClosureCode ON ChangeDetails.CHANGEID=ChangeToClosureCode.CHANGEID LEFT JOIN Change_ClosureCode ON ChangeToClosureCode.ID=Change_ClosureCode.ID LEFT JOIN ChangeTypeDefinition ON ChangeDetails.CHANGETYPEID=ChangeTypeDefinition.CHANGETYPEID LEFT JOIN ChangeWF_Definition ON ChangeDetails.WFID=ChangeWF_Definition.ID LEFT JOIN ImpactDefinition ON ChangeDetails.IMPACTID=ImpactDefinition.IMPACTID LEFT JOIN ItemDefinition ON ChangeDetails.ITEMID=ItemDefinition.ITEMID LEFT JOIN PriorityDefinition ON ChangeDetails.PRIORITYID=PriorityDefinition.PRIORITYID LEFT JOIN QueueDefinition ON ChangeDetails.GROUPID=QueueDefinition.QUEUEID LEFT JOIN RiskDefinition ON ChangeDetails.RISKID=RiskDefinition.RISKID LEFT JOIN StageDefinition ON ChangeDetails.STAGEID=StageDefinition.STAGEID LEFT JOIN SubCategoryDefinition ON ChangeDetails.SUBCATEGORYID=SubCategoryDefinition.SU
Fwd: Re: Query is slow when run for first time; subsequent execution is fast
Missed to have mailing list in to address.. forwarding now. -- Forwarded message -- From: "Nandakumar M" Date: 15 Jan 2018 12:16 Subject: Re: Query is slow when run for first time; subsequent execution is fast To: "Pavel Stehule" Cc: Hi, On Fri, Jan 12, 2018 at 3:34 PM, Pavel Stehule wrote: > > >> maybe some your indexes and some system tables are bloated. Try you run > VACUUM FULL ANALYZE > Tried this suggestion. Planning time gets reduced slightly but it is still way higher on the first run compared to subsequent runs of the same query. Regards, Nanda
RE: Query is slow when run for first time; subsequent execution is fast
Hi, On 17 Jan 2018 12:55, "POUSSEL, Guillaume" wrote: Are you on Windows or Linux? I’m on Windows and wondering if the issue is the same on Linux? I have experienced this on Mac and Linux machines. You can try pg_prewarm, on pg_statistic table and its index. But I'd probably just put an entry in my db startup script to run this query immediately after startng the server, and let the query warm the cache itself. I will try this suggestion and get back on the thread. Is pg_statistic the only table to be pre cached? Pls let me know if any other table/index needs to be pre warmed. Btw, I don't running a "select * from pg_statistic" will fill the shared buffer. Only 256 kb of data will be cached during sequential scans. I will try pg_prewarm Why do you restart your database often Postgres is bundled with our application and deployed by our client. Starting / stopping the server is not under my control. Regards, Nanda
Re: Query is slow when run for first time; subsequent execution is fast
Hi, I tried pg_prewarm as suggested by Jeff Janes and it works - thanks a lot Jeff. Now the query planning is fast on the first execution. Here is the list of tables that needed to be pre warmed (or you could just pre warm all the 'pg_%' tables. :-) ). select pg_prewarm('pg_statistic'); select pg_prewarm('pg_trigger_tgrelid_tgname_index'); select pg_prewarm('pg_trigger'); select pg_prewarm('pg_statistic_relid_att_inh_index'); select pg_prewarm('pg_index_indrelid_index'); select pg_prewarm('pg_index_indexrelid_index'); select pg_prewarm('pg_index'); select pg_prewarm('pg_constraint_conrelid_index'); select pg_prewarm('pg_constraint'); select pg_prewarm('pg_class_relname_nsp_index'); select pg_prewarm('pg_class_oid_index'); select pg_prewarm('pg_attribute_relid_attnum_index'); select pg_prewarm('pg_attribute'); select pg_prewarm('pg_attrdef_adrelid_adnum_index'); select pg_prewarm('pg_attrdef'); select pg_prewarm('pg_amproc_fam_proc_index'); select pg_prewarm('pg_namespace_oid_index'); Regards, Nanda On 18 Jan 2018 07:25, "Michael Paquier" wrote: On Tue, Jan 16, 2018 at 09:18:25PM -0800, Jeff Janes wrote: > Oh. I've not seen that before. But then again I don't often restart my > server and then immediately run very large queries with a stringent time > deadline. > > You can try pg_prewarm, on pg_statistic table and its index. But I'd > probably just put an entry in my db startup script to run this query > immediately after startng the server, and let the query warm the cache > itself. > > Why do you restart your database often enough for this to be an issue? Another thing that you could use here is pg_buffercache which offers a way to look at the Postgres shared buffer contents in real-time: https://www.postgresql.org/docs/current/static/pgbuffercache.html As Jeff says, pg_prewarm is a good tool for such cases to avoid any kind of warmup period when a server starts.. -- Michael
Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used
Hi, I am using Postgres version 9.4.4 on a Mac machine. I have 2 queries that differ only in the order by clause. One of it has 'nulls last' and the other one does not have it. The performance difference between the two is considerable. The slower of the two queries is SELECT wos.notificationstatus, wos.unrepliedcount, wos.shownotestotech, wos.ownerid, wos.isfcr, aau.user_id, wo.workorderid AS "WOID", wo.is_catalog_template AS "TemplateType", wo.title AS "Title", wo.is_catalog_template, aau.first_name AS "Requester", cd.categoryname AS "Category", ti.first_name AS "Owner", wo.duebytimeAS "DueBy", wo.fr_duetime, wo.completedtime AS "CompletedTime", wo.respondedtime AS "RespondedTime", wo.resolvedtime AS "ResolvedTime", qd.queuename AS "Group", std.statusname AS "Status", wo.createdtime AS "CreatedDate", wos.isread, wos.hasattachment, wos.appr_statusid, wos.priorityid, wo.templateid AS "TemplateId", pd.priorityid, pd.priorityname AS "Priority", pd.prioritycolor AS "PriorityColor", wos.isoverdue, wos.is_fr_overdue, wos.linkedworkorderid, wos.editing_status, wos.editorid, wos.linkedworkorderid, wo.isparent, sduser.isvipuser, sduser_onbehalfof.isvipuser AS "ONBEHALFOFVIP", wo.isparent, wos.statusid, sdorganization.name AS "Site", wo.workorderid AS "RequestID" FROM workorder wo left join workorder_fields wof ON wo.workorderid=wof.workorderid left join servicecatalog_fields scf ON wo.workorderid=scf.workorderid left join wotoprojects wtp ON wo.workorderid=wtp.workorderid left join sitedefinition ON wo.siteid=sitedefinition.siteid left join sdorganization ON sitedefinition.siteid=sdorganization.org_id inner join workorderstates wos ON wo.workorderid=wos.workorderid left join categorydefinition cd ON wos.categoryid=cd.categoryid left join aaauser ti ON wos.ownerid=ti.user_id left join aaauser aau ON wo.requesterid=aau.user_id left join prioritydefinition pd ON wos.priorityid=pd.priorityid left join statusdefinition std ON wos.statusid=std.statusid left join workorder_queue wo_queue ON wo.workorderid=wo_queue.workorderid left join queuedefinition qd ON wo_queue.queueid=qd.queueid left join departmentdefinition dpt ON wo.deptid=dpt.deptid left join leveldefinition lvd ON wos.levelid=lvd.levelid left join modedefinition mdd ON wo.modeid=mdd.modeid left join urgencydefinition urgdef ON wos.urgencyid=urgdef.urgencyid left join impactdefinition impdef ON wos.impactid=impdef.impactid left join requesttypedefinition rtdef ON wos.requesttypeid=rtdef.requesttypeid left join subcategorydefinition scd ON wos.subcategoryid=scd.subcategoryid left join itemdefinition icd ON wos.itemid=icd.itemid left join servicedefinition serdef ON wo.serviceid=serdef.serviceid left join aaauser cbau ON wo.createdbyid=cbau.user_id left join aaauser oboaau ON wo.oboid=oboaau.user_id left join sduser ON wo.requesterid=sduser.userid left join sduser sduser_onbehalfof ON wo.oboid=sduser_onbehalfof.userid left join workorder_fields ON wo.workorderid=workorder_fields.workorderid WHERE (( wos.statusid = 1) AND( wo.isparent = TRUE)) ORDER BY 7 DESC nulls last limit 25 On removing 'nulls last' from the order by clause the query becomes very fast. I have attached the query plan for both the queries. >From the plan it looks like the second query is able to efficiently use the workorder_pk index ( The node 'Index Scan Backward using workorder_pk on workorder' returns 25 rows) whereas the first query is not able to use the index efficiently (more than 300k rows are returned from the same node). The column workorderid is a PK column. The query optimizer should ideally know that there is no nulls in this column and in effect there is no difference between the two queries. I tried the same in Postgres 10 and the slower query performs much better due to parallel sequential scans but still it is less efficient than the query without 'nulls last'. I thought it would be best to raise this with the Postgres team. Regards, Nanda pg_9_4_Fast QUERY PLAN
Re: Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used
Hi, On 2 Feb 2018 15:06, "Laurenz Albe" wrote: >In the above case, the optimizer does >not know that it will get the rows >in the correct order: indexes are >sorted ASC NULLS LAST by default, >so a backwards index scan will >produce the results NULLS FIRST, >which is the default for ORDER BY ... >DESC. The order by column has a not null constraint on it and so nulls last or first shouldn't make any difference. >If you want the nulls last, PostgreSQL >has to retrieve *all* the rows and sort >them rather than using the first 25 >results it gets by scanning then >indexes. >To have the above query perform >fast, add additional indexes with either >ASC NULLS FIRST or DESC NULLS >LAST for all used keys. For now this is exactly what I have done. But it is in effect a duplicate index on a PK column and I would be happy not to create it in the first place. Regards Nanda
Re: Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used
Hi, On Fri, Feb 2, 2018 at 8:30 PM, Tom Lane wrote: > > The planner does not consider this and it doesn't really seem like > something worth expending cycles on. If you know that there won't be > nulls in the column, why are you insisting on specifying a nondefault > value of NULLS FIRST/LAST in the query? The query is generated by a framework that adds 'nulls last' to all order by clause. This is done apparently to provide common behaviour in our application irrespective of the database that is used. SQL server treats nulls as lesser than non null values which is opposite to what Postgres does. For any indexes that we create manually, we can do a --> create index on table_name(column_name nulls first); But, for the PK column we are not in control of the index that is created. Regards, Nanda
Re: Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used
Hi, On Fri, Feb 2, 2018 at 9:28 PM, David G. Johnston wrote: > You probably can (I assume the nulls aspect of the index doesn't prevent PK > usage), but you must add the PK to the table after creating the index and > not let the system auto-generate the index for you. > > https://www.postgresql.org/docs/10/static/sql-altertable.html > > ALTER TABLE name ADD PRIMARY KEY USING INDEX index_name; > I missed to notice this in the docs. Thank you David for pointing it out. Regards, Nanda
Re: Efficiently searching for the most recent rows where a column matches any result from a different query
Hi, I myself am new to performance tuning queries. But, from what you have said it looks like Postgres has to go through all the posts using the backward index scan and find out whether their author is amongst the user's friends list. Since the number of friends is arbitrary for any user, even if a user has few friends (or no friends at all), the stats will not reflect this and so the planner cannot take advantage of this to directly fetch the posts from the small set of friends. My suggestion (which involves changing the schema and query) is to have a last_post_id or last_posted_time column in user table, find the last 10 friends who have posted first and then use it to find the last 10 posts. Something like, select * from posts where posts.author_id in (select id from users where id in (select friend_id from user_friend where user_id = 1) and last_posted_time is not null order by last_posted_time desc limit 10); I am not sure if this is the best way to solve this. If there are better solutions I would be happy to learn the same. Regards Nanda On Thu, Feb 15, 2018 at 5:48 PM, wrote: > > Hello Hellmuth, > > Thank you for your response. > > I've uploaded the query plan for the first query (user_id=2) here: > https://gist.github.com/anonymous/6d251b277ef71f8977b03cab91fedccd > The query plan for the second query (user_id=1) can be found here: > https://gist.github.com/anonymous/32ed485b40cce2651ddc52661f3e7f7b > > Just like in the original queries, posts_user_id_id_index is not used. > > Kind regards, > Milo > > 13. Feb 2018 22:13 by hiv...@gmail.com: > > Hello: > > > EXPLAIN (ANALYZE, BUFFERS) > select * from ( > SELECT posts.id, users.name, posts.content > FROM posts JOIN users ON posts.user_id = users.id > WHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE user_id = > 1) > > ORDER BY posts.id DESC > ) as a > ORDER BY a.id DESC > LIMIT 10; > > -- > > > EXPLAIN (ANALYZE, BUFFERS) > select * from ( > SELECT posts.id, users.name, posts.content > FROM posts JOIN users ON posts.user_id = users.id > WHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE user_id = > 2) > > ORDER BY posts.id DESC > ) as a > ORDER BY a.id DESC > LIMIT 10; > > 2018-02-13 8:28 GMT-05:00 : >> >> Hello, >> >> I have the following schema: >> >> CREATE TABLE users ( >> id BIGSERIAL PRIMARY KEY, >> name TEXT NOT NULL UNIQUE >> ); >> >> CREATE TABLE friends ( >> user_idBIGINT NOT NULL REFERENCES users, >> friend_user_id BIGINT NOT NULL REFERENCES users, >> UNIQUE (user_id, friend_user_id) >> ); >> >> CREATE TABLE posts ( >> id BIGSERIAL PRIMARY KEY, >> user_id BIGINTNOT NULL REFERENCES users, >> content TEXT NOT NULL >> ); >> CREATE INDEX posts_user_id_id_index ON posts(user_id, id); >> >> Each user can unilaterally follow any number of friends. The posts table >> has a large number of rows and is rapidly growing. >> >> My goal is to retrieve the 10 most recent posts of a user's friends. This >> query gives the correct result, but is inefficient: >> >> SELECT posts.id, users.name, posts.content >> FROM posts JOIN users ON posts.user_id = users.id >> WHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE >> user_id = 1) >> ORDER BY posts.id DESC LIMIT 10; >> >> If the user's friends have recently posted, the query is still reasonably >> fast (https://explain.depesz.com/s/6ykR). But if the user's friends haven't >> recently posted or the user has no friends, it quickly deteriorates >> (https://explain.depesz.com/s/OnoG). >> >> If I match only a single post author (e.g. WHERE posts.user_id = 5), >> Postgres uses the index posts_user_id_id_index. But if I use IN, the index >> doesn't appear to be used at all. >> >> How can I get these results more efficiently? >> >> I've uploaded the schema and the queries I've tried to dbfiddle at >> http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=cf1489b7f6d53c3fe0b55ed7ccbad1f0. >> The output of "SELECT version()" is "PostgreSQL 9.6.5 on >> x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit" for >> me. >> >> Thank you in advance for any insights, pointers or suggestions you are >> able to give me. >> >> Regards, >> Milo > > > > > -- > Cordialmente, > > Ing. Hellmuth I. Vargas S. > Esp. Telemática y Negocios por Internet > Oracle Database 10g Administrator Certified Associate > EnterpriseDB Certified PostgreSQL 9.3 Associate >
Re: Efficiently searching for the most recent rows where a column matches any result from a different query
Hi, Correction in the query. I missed to add limit 10 in the outer most query.. > select * from posts where posts.author_id in (select id from users > where id in (select friend_id from user_friend where user_id = 1) and > last_posted_time is not null order by last_posted_time desc limit 10); > select * from posts where posts.author_id in (select id from users where id in (select friend_id from user_friend where user_id = 1) and last_posted_time is not null order by last_posted_time desc limit 10) order by post_id desc limit 10; Regards, Nanda
Re: Performance degrade in Planning Time to find appropriate Partial Index
Hi, https://heapanalytics.com/blog/engineering/running-10-million-postgresql-indexes-in-production >From the link shared above, it looks like what Meenatchi has done should work. Do the conditions on the partial index and query match exactly? ( greater than / greater than equals mismatch maybe?) If conditions for those partial indexes are mutually exclusive and the query has a matching condition then Postgres can use that index alone. Are we missing something here? Regards, Nanda On Thu, Mar 1, 2018 at 6:33 PM, Laurenz Albe wrote: > Meenatchi Sandanam wrote: >> I have created a table with 301 columns(ID, 150 BIGINT, 150 TEXT). The table >> contains >> multiple form data differentiated by ID range. Hence a column contains more >> than one form data. >> To achieve Unique Constraint and Indexing per form, I chose PostgreSQL >> Partial Indexes >> which suits my requirement. I have created Partial Indexes with ID Range as >> criteria and >> it provides Uniqueness and Indexing per form basis as expected. But DML >> operations on a >> particular form scans all the Indexes created for the entire table instead >> of scanning >> the Indexes created for that particular form ID Range. This degrades Planner >> Performance >> and Query Time more than 10 times as below, >> >> Query Result for the table with 3000 Partial Indexes(15 Indexes per form) : > > It is crazy to create 3000 partial indexes on one table. > > No wonder planning and DML statements take very long, they have to consider > all the > indexes. > >> explain analyse select id from form_data_copy where id between 3001 and 4000 >> and bigint50=789; > > Use a single index on (bigint50, id) for best performance. > > Yours, > Laurenz Albe > -- > Cybertec | https://www.cybertec-postgresql.com >
Re: citext performance
Hi, I have also faced the same problem with citext extension. It does not use index when thereby making it almost unusable. The problem has to do with how collation is handled from what I have read in old threads in postgres mailing list (please refer https://dba.stackexchange.com/questions/105244/index-on-column-with-data-type-citext-not-used/105250#105250 ). Regards, Nanda On Fri, Apr 6, 2018 at 10:21 PM, Deepak Somaiya wrote: > > Folks, > I read following (PostgreSQL: Documentation: 9.6: citext) and it does not > hold true in my testing.. i.e citext is not performing better than lower.Am I > missing something? help is appreciated. > > PostgreSQL: Documentation: 9.6: citext > > > > > "citext is not as efficient as text because the operator functions and the > B-tree comparison functions must make copies of the data and convert it to > lower case for comparisons. It is, however, slightly more efficient than > using lower to get case-insensitive matching." > > > Here is what I have done > > drop table test; > drop table testci; > > CREATE TABLE test ( > id INTEGER PRIMARY KEY, > name character varying(254) > ); > CREATE TABLE testci ( > id INTEGER PRIMARY KEY, > name citext > > ); > > INSERT INTO test(id, name) > SELECT generate_series(101,200), (md5(random()::text)); > > INSERT INTO testci(id, name) > SELECT generate_series(1,100), (md5(random()::text)); > > > Now, I have done sequential search > > explain (analyze on, format yaml) select * from test where > lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de'); > - Plan: > Node Type: "Seq Scan" > Parallel Aware: false > Relation Name: "test" > Alias: "test" > Startup Cost: 0.00 > Total Cost: 23334.00 > Plan Rows: 5000 > Plan Width: 37 > Actual Startup Time: 0.016 > Actual Total Time: 680.199 > Actual Rows: 1 > Actual Loops: 1 > Filter: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)" > Rows Removed by Filter: 99 > Planning Time: 0.045 > Triggers: > Execution Time: 680.213 > > > explain (analyze on, format yaml) select * from testci where > name='956d692092f0b9f85f36bf2b2501f3ad'; > - Plan: > Node Type: "Seq Scan" > Parallel Aware: false > Relation Name: "testci" > Alias: "testci" > Startup Cost: 0.00 > Total Cost: 20834.00 > Plan Rows: 1 > Plan Width: 37 > Actual Startup Time: 0.017 > Actual Total Time: 1184.485 > Actual Rows: 1 > Actual Loops: 1 > Filter: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)" > Rows Removed by Filter: 99 > Planning Time: 0.029 > Triggers: > Execution Time: 1184.496 > > > > You can see sequential searches with lower working twice as fast as citext. > > Now I added index on citext and equivalent functional index (lower) on text. > > > CREATE INDEX textlowerindex ON test (lower(name)); > create index textindex on test(name); > > > Index creation took longer with citext v/s creating lower functional index. > > > Now here comes execution with indexes > > explain (analyze on, format yaml) select * from test where > lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de'); > > - Plan: > Node Type: "Bitmap Heap Scan" > Parallel Aware: false > Relation Name: "test" > Alias: "test" > Startup Cost: 187.18 > Total Cost: 7809.06 > Plan Rows: 5000 > Plan Width: 37 > Actual Startup Time: 0.020 > Actual Total Time: 0.020 > Actual Rows: 1 > Actual Loops: 1 > Recheck Cond: "(lower((name)::text) = > 'f6d7d5be1d0bed1cca11540d3a2667de'::text)" > Rows Removed by Index Recheck: 0 > Exact Heap Blocks: 1 > Lossy Heap Blocks: 0 > Plans: > - Node Type: "Bitmap Index Scan" > Parent Relationship: "Outer" > Parallel Aware: false > Index Name: "textlowerindex" > Startup Cost: 0.00 > Total Cost: 185.93 > Plan Rows: 5000 > Plan Width: 0 > Actual Startup Time: 0.016 > Actual Total Time: 0.016 > Actual Rows: 1 > Actual Loops: 1 > Index Cond: "(lower((name)::text) = > 'f6d7d5be1d0bed1cca11540d3a2667de'::text)" > Planning Time: 0.051 > Triggers: > Execution Time: 0.035 > > > > > explain (analyze on, format yaml) select * from test where > lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de'); > > - Plan: > Node Type: "Index Scan" > Parallel Aware: false > Scan Direction: "Forward" > Index Name: "citextindex" > Relation Name: "testci" > Alias: "testci" > Startup Cost: 0.42 > Total Cost: 8.44 > Plan Rows: 1 > Plan Width: 37 > Actual Startup Time: 0.049 > Actual Total Time: 0.050 > Actual Rows: 1 > Actual Loops: 1 > Index Cond: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)" > Rows Removed by Index Recheck: 0 > Planning Time: 0.051 > Triggers: > Execution Time: 0.064 > > > Deepak
Need help with optimising simple query
Hi, I am having a query that has an order by and a limit clause. The column on which I am doing order by is indexed (default b tree index). However the index is not being used. On tweaking the query a bit I found that when I use left join index is not used whereas when I use inner join the index is used. Unfortunately, the behaviour we expect is that of left join only. My question is, is there any way to modify/improve the query to improve the query speed or is this the best that is possible for this case. Please find below a simplified version of the queries. I tried the queries on 9.3 and 10 versions and both gave similar results. Table structure performance_test=# \d+ child Table "public.child" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ++---+--+---+--+--+- id | bigint | | not null | nextval('child_id_seq'::regclass) | plain| | name | text | | not null | | extended | | Indexes: "child_pkey" PRIMARY KEY, btree (id) "child_name_unique" UNIQUE CONSTRAINT, btree (name) Referenced by: TABLE "parent" CONSTRAINT "parent_child_id_fkey" FOREIGN KEY (child_id) REFERENCES child(id) performance_test=# \d+ parent Table "public.parent" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --++---+--++--+--+- id | bigint | | not null | nextval('parent_id_seq'::regclass) | plain| | name | text | | not null | | extended | | child_id | bigint | | | | plain| | Indexes: "parent_pkey" PRIMARY KEY, btree (id) "parent_name_unique" UNIQUE CONSTRAINT, btree (name) "parent_child_id_idx" btree (child_id) Foreign-key constraints: "parent_child_id_fkey" FOREIGN KEY (child_id) REFERENCES child(id) Query used to populate data performance_test=# insert into child(name) select concat('child ', gen.id) as name from (select generate_series(1,10) as id) as gen; performance_test=# insert into parent(name, child_id) select concat('parent ', gen.id) as name, (id%10) + 1 from (select generate_series(1,100) as id) as gen; Left join with order by using child name performance_test=# explain analyze select * from parent left join child on parent.child_id = child.id order by child.name limit 10; QUERY PLAN -- Limit (cost=69318.55..69318.58 rows=10 width=59) (actual time=790.708..790.709 rows=10 loops=1) -> Sort (cost=69318.55..71818.55 rows=100 width=59) (actual time=790.705..790.706 rows=10 loops=1) Sort Key: child.name Sort Method: top-N heapsort Memory: 27kB -> Hash Left Join (cost=3473.00..47708.91 rows=100 width=59) (actual time=51.066..401.028 rows=100 loops=1) Hash Cond: (parent.child_id = child.id) -> Seq Scan on parent (cost=0.00..17353.00 rows=100 width=29) (actual time=0.026..67.848 rows=100 loops=1) -> Hash (cost=1637.00..1637.00 rows=10 width=19) (actual time=50.879..50.879 rows=10 loops=1) Buckets: 65536 Batches: 2 Memory Usage: 3053kB -> Seq Scan on child (cost=0.00..1637.00 rows=10 width=19) (actual time=0.018..17.281 rows=10 loops=1) Planning time: 1.191 ms Execution time: 790.797 ms (12 rows) Inner join with sorting according to child name performance_test=# explain analyze select * from parent inner join child on parent.child_id = child.id order by child.name limit 10; QUERY PLAN -- Limit (cost=0.84..2.03 rows=10 width=59) (actual time=0.156..0.193 rows=10 loops=1) -> Nested Loop (cost=0.84..119132.56 rows=100 width=59) (actual time=0.154..0.186 rows=10 loops=1) -> Index Scan using child_name_unique on child (cost=0.42..5448.56 rows=10 width=19) (actual time=0.126..0.126 rows=1 loops=1) -> Index Scan using parent_child_id_idx on parent (cost=0.42..1.04 rows=10 width=29) (actual time=0.019..0.045 rows=10 loops=1) Index Cond: (child_id = child.id) Planning time: 0.941 ms Execution time: 0.283 ms (7 rows) Version performance_test=# select version(); version --
Re: Need help with optimising simple query
Hi Tom, Is there something that I can do to improve the performance of such queries (where ordering is done based on child table column and join is left join)? Maybe a combined index or something like that? Or is it possible to modify the query to get same result but execute faster. One ad-hoc optimisation (which gives somewhat better performance) that came to mind is to have a sub query for child table like performance_test=# explain analyze select * from parent left join (select * from child order by name limit 10) as child on parent.child_id = child.id order by child.name limit 10; QUERY PLAN - Limit (cost=42714.84..42714.86 rows=10 width=59) (actual time=311.623..311.624 rows=10 loops=1) -> Sort (cost=42714.84..45214.84 rows=100 width=59) (actual time=311.622..311.622 rows=10 loops=1) Sort Key: child.name Sort Method: top-N heapsort Memory: 26kB -> Hash Left Join (cost=1.19..21105.20 rows=100 width=59) (actual time=0.120..204.386 rows=100 loops=1) Hash Cond: (parent.child_id = child.id) -> Seq Scan on parent (cost=0.00..17353.00 rows=100 width=29) (actual time=0.073..73.052 rows=100 loops=1) -> Hash (cost=1.06..1.06 rows=10 width=19) (actual time=0.035..0.035 rows=10 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Limit (cost=0.42..0.96 rows=10 width=19) (actual time=0.014..0.027 rows=10 loops=1) -> Index Scan using child_name_unique on child (cost=0.42..5448.56 rows=10 width=19) (actual time=0.013..0.024 rows=10 loops=1) Planning time: 0.505 ms Execution time: 311.682 ms (13 rows) Time: 312.673 ms Is there something I can do that will improve the query performance much more than this? Thanks. Regards, Nanda On Mon, 9 Jul 2018, 19:53 Tom Lane, wrote: > > Nandakumar M writes: > > I am having a query that has an order by and a limit clause. The > > column on which I am doing order by is indexed (default b tree index). > > However the index is not being used. On tweaking the query a bit I > > found that when I use left join index is not used whereas when I use > > inner join the index is used. > > The reason the index isn't being used is that the sort order the query > requests isn't the same as the order provided by the index. Here: > > > performance_test=# explain analyze select * from parent left join > > child on parent.child_id = child.id order by child.name limit 10; > > you're asking to sort by a column that will include null values for > child.name anywhere that there's a parent row without a match for > child_id. Those rows aren't even represented in the index on child.name, > much less placed in the right order. > > regards, tom lane
Setting effective_cache size
Hi, According to https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server : > effective_cache_size should be set to an estimate of how much memory is > available for disk caching by the operating system and within the database > itself, after taking into account what's used by the OS itself and other > applications. I intend to run a java application and postgres server in the same server machine. The java application requires 2 GB RAM max. Considering that our server machine has 4 GB RAM, should I reduce the effective_cache_size to say 768 MB or am I better off with the default 4 GB value? This is particularly confusing because in this thread Tom Lane says the following > I see no problem with a value of say 4GB; > that's very unlikely to be worse than the pre-9.4 default (128MB) on any > modern machine. PS : I got the value 768 MB from https://pgtune.leopard.in.ua/#/ by giving 1 GB as the amount of memory postgres can use. Regards, Nanda
Re: Setting effective_cache size
On Thu, Jan 31, 2019 at 1:00 PM Nandakumar M wrote: > This is particularly confusing because in this thread Tom Lane says > the following > Missed to link the thread.. https://postgrespro.com/list/thread-id/1813920 Regards, Nanda
Simple update query is slow
Hi, I noticed something strange in our PG server. I have a table named 'timetable' that has only one bigint column and one row. Once in every 5 seconds this row is updated to the current time epoch value in milliseconds. The update query seems to be taking considerable time (avg 50 milliseconds). When I tried generating the explain (analyze,buffers) for the query, the planning time + execution time is always less than 0.1 millisecond. However the query time as shown when /timing of psql is enabled shows approx 30 milliseconds (I am connecting via psql from the localhost). Please find the details below. postgres=> select version(); version PostgreSQL 9.4.15 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18), 64-bit (1 row) Time: 0.572 ms postgres=> \d+ timetable Table "public.timetable" Column | Type | Modifiers | Storage | Stats target | Description ++---+-+--+- time | bigint | | plain | | postgres=> table timetable ; time 1605988584 (1 row) Time: 0.402 ms postgres=> explain (analyze,buffers,verbose) update timetable set time = time+0; QUERY PLAN - Update on public.timetable (cost=0.00..4.01 rows=1 width=14) (actual time=0.064..0.064 rows=0 loops=1) Buffers: shared hit=5 -> Seq Scan on public.timetable (cost=0.00..4.01 rows=1 width=14) (actual time=0.029..0.029 rows=1 loops=1) Output: ("time" + 0), ctid Buffers: shared hit=4 Planning time: 0.054 ms Execution time: 0.093 ms (7 rows) Time: 27.685 ms Sometimes this shoots up to even a few hundred milliseconds. postgres=> explain (analyze,buffers,verbose) update timetable set time = time+0; QUERY PLAN - Update on public.timetable (cost=0.00..4.01 rows=1 width=14) (actual time=0.048..0.048 rows=0 loops=1) Buffers: shared hit=5 -> Seq Scan on public.timetable (cost=0.00..4.01 rows=1 width=14) (actual time=0.027..0.028 rows=1 loops=1) Output: ("time" + 0), ctid Buffers: shared hit=4 Planning time: 0.063 ms Execution time: 0.084 ms (7 rows) Time: 291.090 ms I guess the problem here may somehow be linked to frequent updates to the one row. However I want to understand what exactly is going wrong here. Also I don't understand the discrepancy between planning + execution time from explain analyze and the time taken by the query as reported in pg log and in psql console. Kindly help me on this. Regards, Nanda
Re: Simple update query is slow
Hi, Just realised that the time difference between explain analyze plan and /timing result is due to the implicit commit. Sorry about that. Regards, Nanda On Sun, 22 Nov 2020 at 01:57, Nandakumar M wrote: > > Hi, > > I noticed something strange in our PG server. I have a table named > 'timetable' that has only one bigint column and one row. > > Once in every 5 seconds this row is updated to the current time epoch > value in milliseconds. > > The update query seems to be taking considerable time (avg 50 > milliseconds). When I tried generating the explain (analyze,buffers) > for the query, the planning time + execution time is always less than > 0.1 millisecond. However the query time as shown when /timing of psql > is enabled shows approx 30 milliseconds (I am connecting via psql from > the localhost). > > > Please find the details below. > > postgres=> select version(); > version > > PostgreSQL 9.4.15 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) > 4.4.7 20120313 (Red Hat 4.4.7-18), 64-bit > (1 row) > > Time: 0.572 ms > > > > > > postgres=> \d+ timetable > Table "public.timetable" > Column | Type | Modifiers | Storage | Stats target | Description > ++---+-+--+- > time | bigint | | plain | | > > > > > > > postgres=> table timetable ; > time > > 1605988584 > (1 row) > > Time: 0.402 ms > > > > > > postgres=> explain (analyze,buffers,verbose) update timetable set time = > time+0; >QUERY PLAN > - > Update on public.timetable (cost=0.00..4.01 rows=1 width=14) (actual > time=0.064..0.064 rows=0 loops=1) >Buffers: shared hit=5 >-> Seq Scan on public.timetable (cost=0.00..4.01 rows=1 width=14) > (actual time=0.029..0.029 rows=1 loops=1) > Output: ("time" + 0), ctid > Buffers: shared hit=4 > Planning time: 0.054 ms > Execution time: 0.093 ms > (7 rows) > > Time: 27.685 ms > > > Sometimes this shoots up to even a few hundred milliseconds. > > postgres=> explain (analyze,buffers,verbose) update timetable set time = > time+0; >QUERY PLAN > - > Update on public.timetable (cost=0.00..4.01 rows=1 width=14) (actual > time=0.048..0.048 rows=0 loops=1) >Buffers: shared hit=5 >-> Seq Scan on public.timetable (cost=0.00..4.01 rows=1 width=14) > (actual time=0.027..0.028 rows=1 loops=1) > Output: ("time" + 0), ctid > Buffers: shared hit=4 > Planning time: 0.063 ms > Execution time: 0.084 ms > (7 rows) > > Time: 291.090 ms > > > > > I guess the problem here may somehow be linked to frequent updates to > the one row. However I want to understand what exactly is going wrong > here. Also I don't understand the discrepancy between planning + > execution time from explain analyze and the time taken by the query as > reported in pg log and in psql console. > > Kindly help me on this. > > Regards, > Nanda