Re: [PERFORM] cost-based vacuum
On Fri, 2005-07-08 at 12:25 -0400, Ian Westmacott wrote: > I am beginning to look at Postgres 8, and am particularly > interested in cost-based vacuum/analyze. I'm hoping someone > can shed some light on the behavior I am seeing. > > Suppose there are three threads: > > writer_thread > every 1/15 second do > BEGIN TRANSACTION > COPY table1 FROM stdin > ... > COPY tableN FROM stdin > perform several UPDATEs, DELETEs and INSERTs > COMMIT > > reader_thread > every 1/15 second do > BEGIN TRANSACTION > SELECT FROM table1 ... > ... > SELECT FROM tableN ... > COMMIT > > analyze_thread > every 5 minutes do > ANALYZE table1 > ... > ANALYZE tableN > > > Now, Postgres 8.0.3 out-of-the-box (all default configs) on a > particular piece of hardware runs the Postgres connection for > writer_thread at about 15% CPU (meaningless, I know, but for > comparison) and runs the Postgres connection for reader_thread > at about 30% CPU. Latency for reader_thread seeing updates > from writer_thread is well under 1/15s. Impact of > analyze_thread is negligible. > > If I make the single configuration change of setting > vacuum_cost_delay=1000, each iteration in analyze_thread takes > much longer, of course. But what I also see is that the CPU > usage of the connections for writer_thread and reader_thread > spike up to well over 80% each (this is a dualie) and latency > drops to 8-10s, during the ANALYZEs. > > I don't understand why this would be. I don't think there > are any lock issues, and I don't see any obvious I/O issues. > Am I missing something? Is there any way to get some > insight into what those connections are doing? The ANALYZE commands hold read locks on the tables you wish to write to. If you slow them down, you merely slow down your write transactions also, and then the read transactions that wait behind them. Every time the ANALYZE sleeps it wakes up the other transactions, which then realise they can't move because of locks and then wake up the ANALYZEs for another shot. The end result is that you introduce more context- switching, without any chance of doing more useful work while the ANALYZEs sleep. Don't use the vacuum_cost_delay in this situation. You might try setting it to 0 for the analyze_thread only. Sounds like you could speed things up by splitting everything into two sets of tables, with writer_thread1 and writer_thread2 etc. That way your 2 CPUs would be able to independently be able to get through more work without locking each other out. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] cost-based vacuum
On Mon, 2005-07-11 at 07:31, Simon Riggs wrote: > The ANALYZE commands hold read locks on the tables you wish to write to. > If you slow them down, you merely slow down your write transactions > also, and then the read transactions that wait behind them. Every time > the ANALYZE sleeps it wakes up the other transactions, which then > realise they can't move because of locks and then wake up the ANALYZEs > for another shot. The end result is that you introduce more context- > switching, without any chance of doing more useful work while the > ANALYZEs sleep. Let me make sure I understand. ANALYZE acquires a read lock on the table, that it holds until the operation is complete (including any sleeps). That read lock blocks the extension of that table via COPY. Is that right? According to the 8.0 docs, ANALYZE acquires an ACCESS SHARE lock on the table, and that conflicts only with ACCESS EXCLUSIVE. Thats why I didn't think I had a lock issue, since I think COPY only needs ROW EXCLUSIVE. Or perhaps the transaction needs something more? Thanks, --Ian ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Question
In the past week, one guy of Unix Group in Colombia say: "Postgrest in production is bat, if the power off in any time the datas is lost why this datas is in plain files. Postgrest no ssupport data bases with more 1 millon of records". Wath tell me in this respect?, is more best Informix as say Ing. Alejandro Lemus G. Radio Taxi Aeropuerto S.A. Avenida de las Américas # 51 - 39 Bogotá - Colombia Tel: 571-4470694 / 571-4202600 Ext. 260 Fax: 571-2624070 email: [EMAIL PROTECTED] __ Correo Yahoo! Espacio para todos tus mensajes, antivirus y antispam ¡gratis! Regístrate ya - http://correo.espanol.yahoo.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Question
Perhaps choose a better subject than "question" next time? Alejandro Lemus wrote: In the past week, one guy of Unix Group in Colombia say: "Postgrest in production is bat, if the power off in any time the datas is lost Wrong. And it's called "PostgreSQL". > why this datas is in plain files. Postgrest no ssupport data bases with more 1 millon of records". Wrong. Wath tell me in this respect?, is more best Informix as say Your contact in the Unix Group in Columbia obviously talks on subjects where he knows little. Perhaps re-evaluate anything else you've heard from him. You can find details on PostgreSQL at http://www.postgresql.org/, including the manuals: http://www.postgresql.org/docs/8.0/static/index.html The FAQ: http://www.postgresql.org/docs/faq/ Spanish/Brazilian communities, which might prove useful http://www.postgresql.org/community/international PostgreSQL is licensed under the BSD licence, which means you can freely download or deploy it in a commercial setting if you desire. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] cost-based vacuum
Simon Riggs <[EMAIL PROTECTED]> writes: >> I don't understand why this would be. I don't think there >> are any lock issues, and I don't see any obvious I/O issues. > The ANALYZE commands hold read locks on the tables you wish to write to. Unless there were more commands that Ian didn't show us, he's not taking any locks that would conflict with ANALYZE. So I don't believe this is the explanation. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Question
> In the past week, one guy of Unix Group in Colombia > say: "Postgrest in production is bat, if the power off in any > time the datas is lost why this datas is in plain files. > Postgrest no ssupport data bases with more 1 millon of records". > Wath tell me in this respect?, is more best Informix as say Both these statements are completely incorrect. Unlike some other "database systems", PostgreSQL *does* survive power loss without any major problems. Assuming you use a metadata journailng filesystem, and don't run with non-battery-backed write-cache (but no db can survive that..) And having a million records is no problem at all. You may run into considerations when you're talking billions, but you can do that as well - it just takes a bit more knowledge before you can do it right. //Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] cost-based vacuum
On Mon, 2005-07-11 at 09:07 -0400, Ian Westmacott wrote: > On Mon, 2005-07-11 at 07:31, Simon Riggs wrote: > > The ANALYZE commands hold read locks on the tables you wish to write to. > > If you slow them down, you merely slow down your write transactions > > also, and then the read transactions that wait behind them. Every time > > the ANALYZE sleeps it wakes up the other transactions, which then > > realise they can't move because of locks and then wake up the ANALYZEs > > for another shot. The end result is that you introduce more context- > > switching, without any chance of doing more useful work while the > > ANALYZEs sleep. > > Let me make sure I understand. ANALYZE acquires a read > lock on the table, that it holds until the operation is > complete (including any sleeps). That read lock blocks > the extension of that table via COPY. Is that right? > > According to the 8.0 docs, ANALYZE acquires an ACCESS SHARE > lock on the table, and that conflicts only with ACCESS > EXCLUSIVE. Thats why I didn't think I had a lock issue, > since I think COPY only needs ROW EXCLUSIVE. Or perhaps > the transaction needs something more? The docs are correct, but don't show catalog and buffer locks. ...but on further reading of the code there are no catalog locks or buffer locks held across the sleep points. So, my explanation doesn't work as an explanation for the sleep/no sleep difference you have observed. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Question
As a sometimes Informix and PostgreSQL DBA, I disagree with the contentions below. We have many tables with 10s of millions of rows in Postgres. We have had (alas) power issues with our lab on more than one occasion and the afflicted servers have recovered like a champ, every time. This person may not like postgres (or very much likes Informix), but he shouldn't conjure up spurious reasons to support his/her prejudice. Informix is an excellent product, but it can be costly for web related applications. PostgeSQL is also an excellent database. Each has differences which may make the decision between the two of them clear. But facts are necessary to have a real discussion. Greg WIlliamson DBA GlobeXplorer LLC -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Alejandro Lemus Sent: Monday, July 11, 2005 6:00 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] Question In the past week, one guy of Unix Group in Colombia say: "Postgrest in production is bat, if the power off in any time the datas is lost why this datas is in plain files. Postgrest no ssupport data bases with more 1 millon of records". Wath tell me in this respect?, is more best Informix as say Ing. Alejandro Lemus G. Radio Taxi Aeropuerto S.A. Avenida de las Américas # 51 - 39 Bogotá - Colombia Tel: 571-4470694 / 571-4202600 Ext. 260 Fax: 571-2624070 email: [EMAIL PROTECTED] __ Correo Yahoo! Espacio para todos tus mensajes, antivirus y antispam ¡gratis! Regístrate ya - http://correo.espanol.yahoo.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend !DSPAM:42d26e2065882109568359! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Data Warehousing Tuning
>- Sun V250 server >- 2*1.3GHz Sparc IIIi CPU >- 8GB RAM >- 8*73GB SCSI drives >- Solaris 10 >- Postgres 8 >4) We moved the pg_xlog files off /data/postgres (disks 2-7) and into >/opt/pg_xlog (disks 0-1), but it seemed like performance decreased, >so we moved them back again. You have saturated SCSI bus. 1x160GB/s SCSI too small for 8xHDD with 30-70MB/s Solutions: Replace CD/DVD/tape at top 2x5" slots on 2xHDD (320 SCSI), install PCI 64/66 SCSI 320 controller (or simple RAID1 controller for minimize saturation of PCI buses) and attach to 2xHDD. Move /opt/pg_xlog on this drives. Best regards, Alexander Kirpa ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] join and query planner
(first at all, sorry for my english) Hi. - Does "left join" restrict the order in which the planner must join tables? I've read about join, but i'm not sure about left join... - If so: Can I avoid this behavior? I mean, make the planner resolve the query, using statistics (uniqueness, data distribution) rather than join order. My query looks like: SELECT ... FROM a, b, LEFT JOIN c ON (c.key = a.key) LEFT JOIN d on (d.key=a.key) WHERE (a.key = b.key) AND (b.column <= 100) b.column has a lot better selectivity, but planner insist on resolve first c.key = a.key. Of course, I could rewrite something like: SELECT ... FROM (SELECT ... FROM a,b LEFT JOIN d on (d.key=a.key) WHERE (b.column <= 100) ) as aa LEFT JOIN c ON (c.key = aa.key) but this is query is constructed by an application with a "multicolumn" filter. It's dynamic. It means that a user could choose to look for "c.column = 1000". And also, combinations of filters. So, I need the planner to choose the best plan... I've already change statistics, I clustered tables with cluster, ran vacuum analyze, changed work_mem, shared_buffers... Greetings. TIA. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] join and query planner
Dario Pudlo wrote: > (first at all, sorry for my english) > Hi. >- Does "left join" restrict the order in which the planner must join > tables? I've read about join, but i'm not sure about left join... >- If so: Can I avoid this behavior? I mean, make the planner resolve the > query, using statistics (uniqueness, data distribution) rather than join > order. > > My query looks like: > SELECT ... > FROM a, b, > LEFT JOIN c ON (c.key = a.key) > LEFT JOIN d on (d.key=a.key) > WHERE (a.key = b.key) AND (b.column <= 100) > > b.column has a lot better selectivity, but planner insist on resolve > first c.key = a.key. > > Of course, I could rewrite something like: > SELECT ... > FROM >(SELECT ... > FROM a,b > LEFT JOIN d on (d.key=a.key) > WHERE (b.column <= 100) > ) > as aa > LEFT JOIN c ON (c.key = aa.key) > > but this is query is constructed by an application with a "multicolumn" > filter. It's dynamic. > It means that a user could choose to look for "c.column = 1000". And > also, combinations of filters. > > So, I need the planner to choose the best plan... Probably forcing the other join earlier could help: SELECT ... FROM a JOIN b ON (a.key = b.key) LEFT JOIN c ON (c.key = a.key) ... I think the problem is that postgresql can't break JOIN syntax very easily. But you can make the JOIN earlier. John =:-> > > I've already change statistics, I clustered tables with cluster, ran vacuum > analyze, changed work_mem, shared_buffers... > > Greetings. TIA. > > > ---(end of broadcast)--- > TIP 2: Don't 'kill -9' the postmaster > signature.asc Description: OpenPGP digital signature
[PERFORM] Sorting on longer key is faster ?
The 2 queries are almost same, but ORDER BY x||t is FASTER than ORDER BY x.. How can that be possible? Btw: x and x||t are same ordered phoeniks=> explain analyze SELECT * FROM test WHERE i<20 ORDER BY x || t; QUERY PLAN -- Sort (cost=2282.65..2284.92 rows=907 width=946) (actual time=74.982..79.114 rows=950 loops=1) Sort Key: (x || t) -> Index Scan using i_i on test (cost=0.00..2238.09 rows=907 width=946) (actual time=0.077..51.015 rows=950 loops=1) Index Cond: (i < 20) Total runtime: 85.944 ms (5 rows) phoeniks=> explain analyze SELECT * FROM test WHERE i<20 ORDER BY x; QUERY PLAN - Sort (cost=2280.38..2282.65 rows=907 width=946) (actual time=175.431..179.239 rows=950 loops=1) Sort Key: x -> Index Scan using i_i on test (cost=0.00..2235.82 rows=907 width=946) (actual time=0.024..5.378 rows=950 loops=1) Index Cond: (i < 20) Total runtime: 183.317 ms (5 rows) phoeniks=> \d+ test Table "public.test" Column | Type | Modifiers | Description +-+---+- i | integer | | t | text| | x | text| | Indexes: "i_i" btree (i) "x_i" btree (xpath_string(x, 'data'::text)) "x_ii" btree (xpath_string(x, 'movie/characters/character'::text)) Has OIDs: no ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Sorting on longer key is faster ?
jobapply wrote: > The 2 queries are almost same, but ORDER BY x||t is FASTER than ORDER BY x.. > > How can that be possible? > > Btw: x and x||t are same ordered > > phoeniks=> explain analyze SELECT * FROM test WHERE i<20 ORDER BY x || t; > QUERY PLAN > What types are x and t, I have the feeling "x || t" is actually a boolean, so it is only a True/False sort, while ORDER BY x has to do some sort of string comparison (which might actually be a locale depended comparison, and strcoll can be very slow on some locales) John =:-> > > -- > Sort (cost=2282.65..2284.92 rows=907 width=946) (actual > time=74.982..79.114 rows=950 loops=1) >Sort Key: (x || t) >-> Index Scan using i_i on test (cost=0.00..2238.09 rows=907 width=946) > (actual time=0.077..51.015 rows=950 loops=1) > Index Cond: (i < 20) > Total runtime: 85.944 ms > (5 rows) > > phoeniks=> explain analyze SELECT * FROM test WHERE i<20 ORDER BY x; >QUERY PLAN > > - > Sort (cost=2280.38..2282.65 rows=907 width=946) (actual > time=175.431..179.239 rows=950 loops=1) >Sort Key: x >-> Index Scan using i_i on test (cost=0.00..2235.82 rows=907 width=946) > (actual time=0.024..5.378 rows=950 loops=1) > Index Cond: (i < 20) > Total runtime: 183.317 ms > (5 rows) > > > > > > phoeniks=> \d+ test > Table "public.test" > Column | Type | Modifiers | Description > +-+---+- > i | integer | | > t | text| | > x | text| | > Indexes: > "i_i" btree (i) > "x_i" btree (xpath_string(x, 'data'::text)) > "x_ii" btree (xpath_string(x, 'movie/characters/character'::text)) > Has OIDs: no > > > ---(end of broadcast)--- > TIP 3: Have you checked our extensive FAQ? > >http://www.postgresql.org/docs/faq > signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Sorting on longer key is faster ?
"jobapply" <[EMAIL PROTECTED]> writes: > The 2 queries are almost same, but ORDER BY x||t is FASTER than ORDER BY x.. > How can that be possible? Hmm, how long are the x values? Is it possible many of them are TOASTed? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Sorting on longer key is faster ?
Chris Travers wrote: > John A Meinel wrote: > >> jobapply wrote: >> >> >>> The 2 queries are almost same, but ORDER BY x||t is FASTER than ORDER >>> BY x.. >>> >>> How can that be possible? >>> >>> Btw: x and x||t are same ordered >>> >>> phoeniks=> explain analyze SELECT * FROM test WHERE i<20 ORDER BY x >>> || t; >>>QUERY PLAN >>> >>> >> >> >> What types are x and t, I have the feeling "x || t" is actually a >> boolean, so it is only a True/False sort, while ORDER BY x has to do >> some sort of string comparison (which might actually be a locale >> depended comparison, and strcoll can be very slow on some locales) >> >> >> > Am I reading this that wrong? I would think that x || t would mean > "concatenate x and t." Sorry, I think you are right. I was getting my operators mixed up. > > This is interesting. I never through of writing a multicolumn sort this > way I'm also surprised that the sort is faster with a merge operation. Are you using UNICODE as the database format? I'm just wondering if it is doing something funny like casting it to an easier to sort type. > > Best Wishes, > Chris Travers > Metatron Technology Consulting PS> Don't forget to Reply All so that your messages go back to the list. signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Sorting on longer key is faster ?
jobapply wrote: > The 2 queries are almost same, but ORDER BY x||t is FASTER than ORDER BY x.. > > How can that be possible? > > Btw: x and x||t are same ordered > > phoeniks=> explain analyze SELECT * FROM test WHERE i<20 ORDER BY x || t; > QUERY PLAN I also thought of another possibility. Are there a lot of similar entries in X? Meaning that the same value is repeated over and over? It is possible that the sort code has a weakness when sorting equal values. For instance, if it was doing a Hash aggregation, you would have the same hash repeated. (It isn't I'm just mentioning a case where it might affect something). If it is creating a tree representation, it might cause some sort of pathological worst-case behavior, where all entries keep adding to the same side of the tree, rather than being more balanced. I don't know the internals of postgresql sorting, but just some ideas. John =:-> signature.asc Description: OpenPGP digital signature