[PERFORM] Tablespaces and query planning
Could somebody confirm or refute the following statements, please? - The statistics gathered by ANALYZE are independent of the tablespace containing the table. - The tablespace containing the table has no influence on query planning unless seq_page_cost or random_page_cost has been set on the tablespace. - VACUUM ANALYZE does the same as VACUUM followed by ANALYZE. Yours, Laurenz Albe -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Seqscan slowness and stored procedures
Ivan Voras wrote: > I have a SQL function (which I've pasted below) and while testing its > code directly (outside a function), this is the "normal", default plan: > > http://explain.depesz.com/s/vfP (67 ms) > > and this is the plain with enable_seqscan turned off: > > http://explain.depesz.com/s/EFP (27 ms) > > Disabling seqscan results in almost 2.5x faster execution. > > However, when this code is wrapped in a function, the execution time is > closer to the second case (which is great, I'm not complaining): > > edem=> explain analyze select * from document_content_top_voted(36); > QUERY PLAN > -- > - > Function Scan on document_content_top_voted (cost=0.25..10.25 > rows=1000 width=188) (actual time=20.644..20.821 rows=167 loops=1) > Total runtime: 21.236 ms > (2 rows) > > I assume that the difference between the function execution time and the > direct plan with seqscan disabled is due to SQL parsing and planning. That cannot be, because SQL functions do not cache execution plans. Did you take caching of table data in the buffer cache or the filesystem cache into account? Did you run your tests several times in a row and were the actual execution times consistent? > Since the plan is compiled-in for stored procedures, is the planner in > that case already running under the assumption that seqscans must be > disabled (or something to that effect)? > > Would tweaking enable_seqscan and other planner functions during the > CREATE FUNCTION have an effect on the stored plan? No, but you can use the SET clause of CREATE FUNCTION to change enable_seqscan for this function if you know that this is the right thing. But be aware that things might be different for other function arguments or when the table data change, so this is normally considered a bad idea. > Do the functions need to be re-created when the database is fully > populated, to adjust their stored plans with regards to new selectivity > situation on the indexes? No. Even in PL/pgSQL, where plans are cached, this is only for the lifetime of the database session. The plan is generated when the function is called for the first time in a database session. Yours, Laurenz Albe -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tablespaces and query planning
> - The statistics gathered by ANALYZE are independent of the tablespace > containing the table. yes. > - The tablespace containing the table has no influence on query planning > unless seq_page_cost or random_page_cost has been set on the > tablespace. yes. > - VACUUM ANALYZE does the same as VACUUM followed by ANALYZE. no. it is fine grained, but in the diffs there is: VACUUM and ANALYSE do not update pg_class the same way for the reltuples/relpages: for ex VACUUM is accurate for index, and ANALYZE is fuzzy so if you issue a vacuum you have exact values, if you then run ANALYZE you may change them to be less precise. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation signature.asc Description: This is a digitally signed message part.
Re: [PERFORM] Multiple Concurrent Updates of Shared Resource Counter
Le jeudi 7 juin 2012 09:53:48, Nir Zilberman a écrit : > Hi. > > We are handling multiple concurrent clients connecting to our system - > trying to get a license seat (each license has an initial capacity of > seats). We have a table which keeps count of the acquired seats for each > license. When a client tries to acquire a seat we first make sure that the > number of acquired seats is less than the license capacity. We then > increase the number of acquired seats by 1. > > Our main problem here is with the acquired seats table. > It is actually a shared resource which needs to be updated concurrently by > multiple transactions. > > When multiple transactions are running concurrently - each transaction > takes a long time to complete because it waits on the lock for the shared > resource table. > > Any suggestions for better implementation/design of this feature would be > much appreciated. maybe you can manage something around UNIQUE (license_id,license_seat_number). It depends of what you achieve, and the tables structures you have. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation signature.asc Description: This is a digitally signed message part.
Re: [PERFORM] Seqscan slowness and stored procedures
On 8 June 2012 11:58, Albe Laurenz wrote: > Did you take caching of table data in the buffer cache or the filesystem > cache into account? Did you run your tests several times in a row and > were the actual execution times consistent? Yes, and yes. >> Would tweaking enable_seqscan and other planner functions during the >> CREATE FUNCTION have an effect on the stored plan? > > No, but you can use the SET clause of CREATE FUNCTION to change > enable_seqscan for this function if you know that this is the right > thing. > But be aware that things might be different for other function arguments > or when the table data change, so this is normally considered a bad > idea. Ok. >> Do the functions need to be re-created when the database is fully >> populated, to adjust their stored plans with regards to new > selectivity >> situation on the indexes? > > No. Even in PL/pgSQL, where plans are cached, this is only for the > lifetime of the database session. The plan is generated when the > function is called for the first time in a database session. Thanks for clearing this up for me! I thought SQL functions are also pre-planned and that the plans are static. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] non index use on LIKE on a non pattern string
Hello, I have noticed that with a SELECT query containing the following constraint: column LIKE ? and an index on that column, PostgreSQL will not use the index even if the parameter doesn't contain special pattern characters such as %. >From PG POV it might be logical, because, who is stupid enough to use the LIKE operator if it's unneeded, right? However from my application POV the users sometimes want to provide a pattern with % and sometimes a more precise condition, and of course, I am uneasy at writing two very similar SQL requests with only the LIKE/= difference; in the end, the non use of an index means unwanted performance degradation. I have come with the following hack in the SQL: ( position('%' in ?) > 0 OR column = ? ) AND ( position('%' in ?) = 0 OR column LIKE ? ) (I know it doesn't cover all the pattern possibilities) Any thoughts on what would be the best approach? Mine looks a bit ugly. Thanks, -- Guillaume Cottenceau -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] non index use on LIKE on a non pattern string
> I have noticed that with a SELECT query containing the following > constraint: > > column LIKE ? > > and an index on that column, PostgreSQL will not use the index > even if the parameter doesn't contain special pattern characters > such as %. you should have a postgresql 8.3,isn't it ? like is equal to "=" in your case, since 8.4 Also you probably want to have a look at http://www.postgresql.org/docs/9.1/static/indexes-opclass.html about your index definition (add the "text_pattern_ops" when required) -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation signature.asc Description: This is a digitally signed message part.
[PERFORM] how to change the index chosen in plan?
I have a query like this: select a.* from a inner join b on a.aid=b.aid where a.col1=33 a.col2=44 and b.bid=8 postgresql selected the index on a.col1 then selected the index on b.bid. But in my situation, I know that the query will be faster if it chose the index on b.bid first since there are only a few rows with value 8. So I re-wrote the query as below: select a.* from a where a.aid in (select aid from b where bid=8) and a.col1=33 a.col2=44 But surprisingly, postgresql didn't change the plan. it still chose to index scan on a.col1. How can I re-wirte the query so postgresql will scan on b.bid first? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] non index use on LIKE on a non pattern string
=?iso-8859-1?q?C=E9dric_Villemain?= writes: >> I have noticed that with a SELECT query containing the following >> constraint: >> >> column LIKE ? >> >> and an index on that column, PostgreSQL will not use the index >> even if the parameter doesn't contain special pattern characters >> such as %. > you should have a postgresql 8.3,isn't it ? > like is equal to "=" in your case, since 8.4 No, the planner has understood about wildcard-free LIKE patterns producing an "=" index condition at least since 7.3. I think what the OP is complaining about is the problem that the pattern has to be actually constant (ie, NOT a parameter) before it can be optimized into an index condition. This should be better in 9.2 ... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to change the index chosen in plan?
Rural Hunter writes: > I have a query like this: > select a.* from a inner join b on a.aid=b.aid where a.col1=33 a.col2=44 > and b.bid=8 > postgresql selected the index on a.col1 then selected the index on > b.bid. But in my situation, I know that the query will be faster if it > chose the index on b.bid first since there are only a few rows with > value 8. If you know that and the planner doesn't, maybe ANALYZE is called for. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to change the index chosen in plan?
No, it's not the analyze problem. For some other values on b.bid such as 9, 10, the plan is fine since there a a lot of rows in table b for them. But for some specific values such as 8 I want the plan changed. 于2012年6月8日 22:10:58,Tom Lane写到: Rural Hunter writes: I have a query like this: select a.* from a inner join b on a.aid=b.aid where a.col1=33 a.col2=44 and b.bid=8 postgresql selected the index on a.col1 then selected the index on b.bid. But in my situation, I know that the query will be faster if it chose the index on b.bid first since there are only a few rows with value 8. If you know that and the planner doesn't, maybe ANALYZE is called for. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] non index use on LIKE on a non pattern string
Le vendredi 8 juin 2012 15:57:07, Tom Lane a écrit : > =?iso-8859-1?q?C=E9dric_Villemain?= writes: > >> I have noticed that with a SELECT query containing the following > >> constraint: > >> > >> column LIKE ? > >> > >> and an index on that column, PostgreSQL will not use the index > >> even if the parameter doesn't contain special pattern characters > >> such as %. > > > > you should have a postgresql 8.3,isn't it ? > > > > like is equal to "=" in your case, since 8.4 > > No, the planner has understood about wildcard-free LIKE patterns > producing an "=" index condition at least since 7.3. I think what the > OP is complaining about is the problem that the pattern has to be > actually constant (ie, NOT a parameter) before it can be optimized into > an index condition. This should be better in 9.2 ... Oops, maybe I shuffled with this * xxx_pattern_ops indexes can now be used for simple equality comparisons, not only for LIKE (Tom) -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation signature.asc Description: This is a digitally signed message part.
Re: [PERFORM] how to change the index chosen in plan?
Rural Hunter wrote: > 于2012年6月8日 22:10:58,Tom Lane写到: >> Rural Hunter writes: >>> I have a query like this: >>> select a.* from a inner join b on a.aid=b.aid where a.col1=33 >>> a.col2=44 and b.bid=8 >>> postgresql selected the index on a.col1 then selected the index >>> on b.bid. But in my situation, I know that the query will be >>> faster if it chose the index on b.bid first since there are only >>> a few rows with value 8. >> >> If you know that and the planner doesn't, maybe ANALYZE is called >> for. >> > No, it's not the analyze problem. So you ran ANALYZE and retried? If not, please do. > For some other values on b.bid such as 9, 10, the plan is fine > since there a a lot of rows in table b for them. So it uses the same plan regardless of the number of rows in table b for the value? That sure *sounds* like you need to run ANALYZE, possibly after adjusting the statistics target for a column or two. > But for some specific values such as 8 I want the plan changed. If you approach it from that line of thought, you will be unlikely to reach a good long-term solution. PostgreSQL has a costing model to determine which plan is expected to be cheapest (fastest). This is based on statistics gathered during ANALYZE and on costing factors. Generally, if it's not choosing the fastest plan, you aren't running ANALYZE frequently enough or with a fine-grained enough statistics target _or_ you need to adjust your costing factors to better model your actual costs. You haven't given us a lot of clues about which it is that you need to do, but there is *some* suggestion that you need to ANALYZE. If you *try* that and it doesn't solve your problem, please read this page and provide more information: http://wiki.postgresql.org/wiki/SlowQueryQuestions -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to change the index chosen in plan?
Hi Kevin, Thanks for your detailed explanation. 于 2012/6/8 22:37, Kevin Grittner 写道: Rural Hunter wrote: 于2012年6月8日 22:10:58,Tom Lane写到: Rural Hunter writes: I have a query like this: select a.* from a inner join b on a.aid=b.aid where a.col1=33 a.col2=44 and b.bid=8 postgresql selected the index on a.col1 then selected the index on b.bid. But in my situation, I know that the query will be faster if it chose the index on b.bid first since there are only a few rows with value 8. If you know that and the planner doesn't, maybe ANALYZE is called for. No, it's not the analyze problem. So you ran ANALYZE and retried? If not, please do. Yes, I did. For some other values on b.bid such as 9, 10, the plan is fine since there a a lot of rows in table b for them. So it uses the same plan regardless of the number of rows in table b for the value? yes. That sure *sounds* like you need to run ANALYZE, possibly after adjusting the statistics target for a column or two. How can adjust the statistics target? But for some specific values such as 8 I want the plan changed. If you approach it from that line of thought, you will be unlikely to reach a good long-term solution. PostgreSQL has a costing model to determine which plan is expected to be cheapest (fastest). This is based on statistics gathered during ANALYZE and on costing factors. Generally, if it's not choosing the fastest plan, you aren't running ANALYZE frequently enough or with a fine-grained enough statistics target _or_ you need to adjust your costing factors to better model your actual costs. You haven't given us a lot of clues about which it is that you need to do, but there is *some* suggestion that you need to ANALYZE. If you *try* that and it doesn't solve your problem, please read this page and provide more information: http://wiki.postgresql.org/wiki/SlowQueryQuestions Sorry the actual tables and query are very complicated so I just simplified the problem with my understanding. I rechecked the query and found it should be simplified like this: select a.* from a inner join b on a.aid=b.aid where a.col1=33 and a.col2=44 and a.timeThere is an index on (a.col1,a.col2,a.time). If I remove the order-by clause, I can get the plan as I expected. I think that's why postgresql selected that index. But still I want the index on b.bid selected first for value 8 since there are only several rows with bid 8. though for other normal values there might be several kilo to million rows. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to change the index chosen in plan?
Rural Hunter wrote: > How can adjust the statistics target? default_statistics_target http://www.postgresql.org/docs/current/interactive/runtime-config-query.html#RUNTIME-CONFIG-QUERY-OTHER or ALTER TABLE x ALTER COLUMN y SET STATISTICS n http://www.postgresql.org/docs/current/interactive/sql-altertable.html > Sorry the actual tables and query are very complicated so I just > simplified the problem with my understanding. I rechecked the > query and found it should be simplified like this: > select a.* from a inner join b on a.aid=b.aid where a.col1=33 and > a.col2=44 and a.time There is an index on (a.col1,a.col2,a.time). If I remove the > order-by clause, I can get the plan as I expected. I think that's > why postgresql selected that index. Sounds like it expects the sort to be expensive, which means it probably expects a large number of rows. An EXPLAIN ANALYZE of the query with and without the ORDER BY might be instructive. It would also help to know what version of PostgreSQL you have and how it is configured, all of which shows up in the results of the query on this page: http://wiki.postgresql.org/wiki/Server_Configuration > But still I want the index on b.bid selected first > for value 8 since there are only several rows with bid 8. though > for other normal values there might be several kilo to million > rows. An EXPLAIN ANALYZE of one where you think the plan is a good choice might also help. Oh, and just to be sure -- are you actually running queries with the literals like you show, or are you using prepared statements with placeholders and plugging the values in after the statement is prepared? Sample code, if possible, might help point to or eliminate issues with a cached plan. If you're running through a cached plan, there is no way for it to behave differently based on the value plugged into the query -- the plan has already been set before you get to that point. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg 9.1 brings host machine down
Thanks alot. I've tried to play with work_mem and after few days of the production testing pg behaves much better. See no more files in the pgsql_tmp folder. pg processes consumes reasonable memory, no swap operation any more. I've studied official pg docs about work_mem an still have no idea which optimal value work_mem should have. 1MB is obviously too small. I've increased up to 32m. due to a lot of the sorts and hash joins in the queries. On Wed, Jun 6, 2012 at 6:40 PM, Patric Bechtel wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hi, > > which fs with which settings are you using? What's the work_mem settings? > Which size do the files > have? > > Depending on the answer of above questions I would suggest: > - - RAM disk, SSD or separate disk for pgsql_tmp > - - using xfs with > noatime,nodiratime,delaylog,logbufs=8,logbsize=256k,nobarrier for the tmp > area > - - separating pg_xlog on yet another disk (xfs, too, but with barrier) > - - using deadline scheduler for all database disks > - - increasing work_mem to at least the "common" file size +50% > > there's more if I'd know more about the setup. > > hth, > > Patric > > Vitalii Tymchyshyn schrieb am 06.06.2012 14:25: > > Hello. > > > > Seen this already. It looks like cross join + sort. Badly configured ORM > tools like Hibernate > > with multiple one-to-many relationships fetched with 'join' strategy may > produce such result. > > Unfortunately I don't know if it's possible to protect from such a case > at server side. > > > > Best regards, Vitalii Tymchyshyn > > > > 06.06.12 15:05, Konstantin Mikhailov написав(ла): > >> I'm faced with a problem running postgres 9.1.3 which seems to nobody > else see before. Tried > >> to search and only one relevant post fond (about millions of files in > pgsql_tmp). > >> > >> Sympthoms: > >> > >> Some postgres process size is getting abnormally big compared to other > postgres processes. > >> Top shows the 'normal' pg processed is about VIRT 120m, RES ~30m and > SHR ~30m. That one is > >> about 6500m, 3.4g, 30m corresp. Total RAM avail - 8g. When one more > such a process appears > >> the host going into deep swap and pg restart can help only (actually > the stop won't even stop > >> such a process - after shutdown it still alive and can be only killed). > >> > >> base/pgsql_tmp contains millions of files. In this situation stop and > dirty restart is > >> possible - the normal startup is impossible either. Read somewhere that > it tries to delete (a > >> millions files) from that directory. I can't even imagine when it > finish the deletion so i'm > >> simple move that folder outside the base - then start can succeed. > >> > >> on ubuntu 11.10,12.04 x64. cpu intel core Q9650 3GHz. 8G RAM. > >> > >> Does anybody see that behaviour or maybe have some glue how to handle > it. > >> > >> PS: the my preliminary conclusion: some sql is produces a lot of files > in the temporary table > >> spaces - very quickly. When sql is finished postgres tries to cleanup > the folder reading all > >> contents of the folder and removing the files one by one. It does the > removal slow (watched > >> the folder by `find pgsql_tmp | wc -l') but process still consumes the > RAM. Next such sql > >> will be a killer :( > >> > >> > > > > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: GnuPT 2.5.2 > > iEYEARECAAYFAk/PT7sACgkQfGgGu8y7ypCr+QCglfi5t4mllLrqVBTbk8SIHt7i > 2y8An2wzekmPmx7DsXDQ/h/t2lwDfYDs > =BHRV > -END PGP SIGNATURE- > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Re: [PERFORM] how to change the index chosen in plan?
于 2012/6/9 0:39, Kevin Grittner 写道: Rural Hunter wrote: How can adjust the statistics target? default_statistics_target http://www.postgresql.org/docs/current/interactive/runtime-config-query.html#RUNTIME-CONFIG-QUERY-OTHER or ALTER TABLE x ALTER COLUMN y SET STATISTICS n http://www.postgresql.org/docs/current/interactive/sql-altertable.html Thanks, I will check detail. Sorry the actual tables and query are very complicated so I just simplified the problem with my understanding. I rechecked the query and found it should be simplified like this: select a.* from a inner join b on a.aid=b.aid where a.col1=33 and a.col2=44 and a.time Sounds like it expects the sort to be expensive, which means it probably expects a large number of rows. An EXPLAIN ANALYZE of the query with and without the ORDER BY might be instructive. It would also help to know what version of PostgreSQL you have and how it is configured, all of which shows up in the results of the query on this page: http://wiki.postgresql.org/wiki/Server_Configuration Here is the output: name | current_setting -+--- version | PostgreSQL 9.1.3 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit archive_command | test ! -f /dbbk/postgres/logarch/%f.gz && gzip -c %p >/dbbk/postgres/logarch/%f.gz archive_mode | on autovacuum | on autovacuum_freeze_max_age | 20 checkpoint_segments | 20 client_encoding | UTF8 effective_cache_size | 150GB full_page_writes | off lc_collate | zh_CN.utf8 lc_ctype | zh_CN.utf8 listen_addresses | * log_autovacuum_min_duration | 30min log_destination | stderr log_line_prefix | %t [%u@%h] log_min_duration_statement | 10s log_statement | ddl logging_collector | on maintenance_work_mem | 10GB max_connections | 2500 max_stack_depth | 2MB max_wal_senders | 1 port | 3500 server_encoding | UTF8 shared_buffers | 60GB synchronous_commit | off TimeZone | PRC track_activities | on track_counts | on vacuum_freeze_table_age | 10 wal_buffers | 16MB wal_level | hot_standby work_mem | 8MB (33 rows) But still I want the index on b.bid selected first for value 8 since there are only several rows with bid 8. though for other normal values there might be several kilo to million rows. An EXPLAIN ANALYZE of one where you think the plan is a good choice might also help. Ok, I get out a simple version of the actualy query. Here is the explain anaylze without order-by, which is I wanted: http://explain.depesz.com/s/p1p Another with the order-by which I want to avoid: http://explain.depesz.com/s/ujU This is the count of rows in article_label with value 3072(which I referred as table b in previous mail): # select count(*) from article_label where lid=3072; count --- 56 (1 row) Oh, and just to be sure -- are you actually running queries with the literals like you show, or are you using prepared statements with placeholders and plugging the values in after the statement is prepared? Sample code, if possible, might help point to or eliminate issues with a cached plan. If you're running through a cached plan, there is no way for it to behave differently based on the value plugged into the query -- the plan has already been set before you get to that point. Yes, I ran the query directly wih psql. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance