Re: [PERFORM] Help: how to speed up query after db server reboot
On Wed, Sep 2, 2009 at 00:01, Wei Yan wrote: > Hi: > > Looks like after postgres db server reboot, first query is very slow > (10+mins). After the system cache built, query is pretty fast. > Now the question is how to speed up the first query slow issue? > > Any pointers? Schedule a run of a couple of representative queries right as the database has started? That should pre-populate the cache before your users get there, hopefully. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg_stat_activity.current_query explanation?
On Wed, Sep 02, 2009 at 11:29:14AM -0400, Pat Chan wrote: >One is '' and the other is simply the word 'end'. >I googled everywhere about this but to not avail. If you could shed some >light on this subject that would be great! >Thank you in advance. '' means that the client has opened a transaction but isn't doing anything right now. If you issue a "BEGIN;" command and then just sit there, for instance, you'll see these. 'END' is synonymous with 'COMMIT', so where those show up, it means the client is in the middle of committing a transaction. -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com signature.asc Description: Digital signature
Re: [PERFORM] Help: how to speed up query after db server reboot
Magnus Hagander writes: > On Wed, Sep 2, 2009 at 00:01, Wei Yan wrote: >> Looks like after postgres db server reboot, first query is very slow >> (10+mins). After the system cache built, query is pretty fast. >> Now the question is how to speed up the first query slow issue? > Schedule a run of a couple of representative queries right as the > database has started? That should pre-populate the cache before your > users get there, hopefully. I wonder if VACUUMing his key tables would be a good answer. I bet that a lot of the problem is swapping in indexes in a slow random-access fashion. In recent-model Postgres, VACUUM will do a sequential scan of the indexes (at least for btree) which should be a much more efficient way of bringing them into kernel cache. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partition query using Seq Scan even when index is present
On Wed, Sep 2, 2009 at 4:15 PM, Kenneth Cox wrote: > With postgresql-8.3.6, I have many partitions inheriting a table. SELECT > min() on the parent performs a Seq Scan, but SELECT min() on a child uses > the index. Is this another case where the planner is not aware enough to > come up with the best plan? I tried creating an index on the parent table > to no avail. Is there a way to formulate the query so that it uses the > index? Here is the general flavor: > > create table calls (caller text, ts timestamptz); > create table calls_partition_2009_08 (check (ts >= '2009-08-01' and ts < > '2009-09-01')) inherits (calls); > create index calls_partition_2009_08_ts on calls_partition_2009_08 (ts); > insert into calls_partition_2009_08 (ts) > select to_timestamp(unix_time) > from generate_series(extract(epoch from '2009-08-01'::timestamptz)::int, > extract(epoch from '2009-08-31 > 23:59'::timestamptz)::int, 60) as unix_time; > analyze calls_partition_2009_08; > explain select min(ts) from calls; ATM, constraint exclusion mainly only supports queries of the form: SELECT ... WHERE 'x', with x being an expression in the check constraint. Table partitioning unfortunately is not a free lunch, you have to be aware of it at all times when writing queries vs your partitioned tables. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3
Scott Otis wrote: > 2x Intel Xeon Quad Core (@2 Ghz - Clovertown,L5335) > 4GB RAM > 4x Seagate 73GB SAS HDD 10k RPM – in RAID ( stripped and mirrored ) Would love to get some advice on how to change my conf settings / setup to get better I/O performance. ~1500 databases w/ ~60 tables each This tells us nothing - size and complexity of databases is more important than their number. Total I/O (these number are pretty constant throughout the day): Reads: ~ 100 / sec for about 2.6 Mb/sec Writes: ~ 400 /sec for about 46.1Mb/sec Again, not enough information. How did you measure these? With iostat? Are those random reads or sequential? (i.e. what was the IO transaction size?) Caching can explain why you have 4x more writes than reads, but it's still unusual, especially with the high write transfer rate you claim. If random, you're doing ~~ 500 IOPS on a RAID10 array of 4 10 kRPM drives, which is much more than you should - you're lucky you have the performance you do. By the way, why do you think your setup is slow? Is your application slow and you think your database is the reason? shared_buffers = 1GB work_mem = 5MB maintenance_work_mem = 256MB Ok. synchronous_commit = off Ok. Could be important if your IO is slow as yours is. checkpoint_segments = 6 You could try raising this to 20, but I doubt it will help you that much. OTOH it won't hurt. checkpoint_warning = 30s effective_cache_size = 1GB Ok. Most of the SQL happening is selects – very little inserts, updates and deletes comparatively. Are you sure? Your write rate is a bit big for there to be very little insert/update/delete activity. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow select times on select with xpath
astro77 wrote: > I've got a table set up with an XML field that I would like to search on > with > 2.5 million records. The xml are serialized objects from my application > which are too complex to break out into separate tables. I'm trying to run a > query similar to this: > > SELECT serialized_object as outVal >from object where > ( > array_to_string(xpath('/a:root/a:Identification/b:ObjectId/text()', > serialized_object, > ARRAY > [ > ARRAY['a', 'http://schemas.datacontract.org/2004/07/Objects'], > ARRAY['b', 'http://schemas.datacontract.org/2004/07/Security'] > > ]), ' ') = 'fdc3da1f-060f-4c34-9c30-d9334d9272ae' > > ) > limit 1000; I would try to minimize how many XML values it had to read, parse, and search. The best approach that comes to mind would be to use tsearch2 techniques (with a GIN or GiST index on the tsvector) to identify which rows contain 'fdc3da1f-060f-4c34-9c30-d9334d9272ae', and use AND to combine that with your xpath search. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3
Scott Otis wrote: Would love to get some advice on how to change my conf settings / setup to get better I/O performance. Total I/O (these number are pretty constant throughout the day): Reads: ~ 100 / sec for about 2.6 Mb/sec Writes: ~ 400 /sec for about 46.1Mb/sec Most of the SQL happening is selects – very little inserts, updates and deletes comparatively. Maybe I'm wrong, but those two don't seem to jive. You say its mostly selects, but you show higher writes per second. Does freebsd have a vmstat or iostat? How did you get the numbers above? How's the cpu's look? (are they pegged?) The io stats above seem low (reading 2 meg a second is a tiny fraction of what your system should be capable of). Have you tried a dd test? -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partition query using Seq Scan even when index is present
Thank you, Greg! I tweaked your function to use recursion to search all inherited tables; my inheritance structure is two levels deep. This function is for integers only; I will copy/waste to create one for timestamps. Extra credit for anyone who can rewrite it to be polymorphic. -- Same as max(_colname) from _relname but much faster for inherited -- tables with an index on _colname. In postgresql-8.3.6 a naive query -- on a parent table will not use the indexes on the child tables. create or replace function partition_max_int(_relname text, _colname text) returns int AS $$ declare childtable RECORD; childres RECORD; maxval int; tmpval int; sql text; begin -- find max in this table (only) sql := 'select max('||_colname||') from only '||quote_ident(_relname); execute sql into maxval; -- recurse to find max in descendants FOR childtable in select pc.relname as relname from pg_class pc join pg_inherits pi on pc.oid=pi.inhrelid where inhparent=(select oid from pg_class where relname=_relname) LOOP tmpval := partition_max_int(childtable.relname, _colname); IF tmpval is not NULL and (tmpval > maxval or maxval is null) THEN maxval := tmpval; END IF; END LOOP; return maxval; end; $$ language 'plpgsql' STABLE; -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3
Scott Otis wrote: I agree that they don't make sense - part of the reason I am looking for help :) I am using iostat to get those numbers ( which I specify to average over 5 min then collect to display in Cacti ). 2 processes are taking up a good deal of CPU - the postgres stats collector and autovacuum ones. Both of those are using a lot of 1 core each. I am not familiar with a dd test - what is that? Thanks, Scott On Sep 3, 2009, at 8:03 AM, "Andy Colson" wrote: Scott Otis wrote: Would love to get some advice on how to change my conf settings / setup to get better I/O performance. Total I/O (these number are pretty constant throughout the day): Reads: ~ 100 / sec for about 2.6 Mb/sec Writes: ~ 400 /sec for about 46.1Mb/sec Most of the SQL happening is selects – very little inserts, updates and deletes comparatively. Maybe I'm wrong, but those two don't seem to jive. You say its mostly selects, but you show higher writes per second. Does freebsd have a vmstat or iostat? How did you get the numbers above? How's the cpu's look? (are they pegged?) The io stats above seem low (reading 2 meg a second is a tiny fraction of what your system should be capable of). Have you tried a dd test? -Andy Please keep the list included so others may help. the dd test: http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm I think Ivan is right, the 2 meg a second is probably because most of the reads are from cache. But he and I looked at the writes differently. If we ignore the 400/sec, and just read 46 meg a second (assuming you meant megabyte and not megabit) then, that's pretty slow (for sequential writing) -- which the dd test will measure your sequential read and write speed. Ivan asked a good question: By the way, why do you think your setup is slow? Is your application slow and you think your database is the reason? -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow select times on select with xpath
On Wed, Sep 2, 2009 at 11:04 AM, astro77 wrote: > > I've got a table set up with an XML field that I would like to search on with > 2.5 million records. The xml are serialized objects from my application > which are too complex to break out into separate tables. I'm trying to run a > query similar to this: > > SELECT serialized_object as outVal > from object where > ( > array_to_string(xpath('/a:root/a:Identification/b:ObjectId/text()', > serialized_object, > ARRAY > [ > ARRAY['a', 'http://schemas.datacontract.org/2004/07/Objects'], > ARRAY['b', 'http://schemas.datacontract.org/2004/07/Security'] > > ]), ' ') = 'fdc3da1f-060f-4c34-9c30-d9334d9272ae' > > ) > limit 1000; > > I've also set up an index on the xpath query like this... > > CREATE INDEX concurrently > idx_object_nodeid > ON > object > USING > btree( > > cast(xpath('/a:root/a:Identification/b:ObjectId/text()', serialized_object, > ARRAY > [ > ARRAY['a', 'http://schemas.datacontract.org/2004/07/Objects'], > ARRAY['b', 'http://schemas.datacontract.org/2004/07/Security'] > > ])as text[]) > ); > > The query takes around 30 minutes to complete with or without the index in > place and does not cache the query. Additionally the EXPLAIN say that the > index is not being used. I've looked everywhere but can't seem to find solid > info on how to achieve this. Any ideas would be greatly appreciated. Why do you have a cast in the index definition? ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3
Sorry about not responding to the whole list earlier - this is my first time posting to a mailing list. Would providing more information about the size and complexities of the databases help? I measure I/O stats with iostat - here is the command I use: iostat -d -x mfid0 -t 290 2 I tried looking at the man page for iostat but couldn't find anywhere how to determine what the stats are for sequential vs random - any help there? When using 'top -m io' the postgres stats collector process is constantly at 99% - 100%. When using 'top' the WCPU for the postgres stats collector and the autovacuum process are constantly at 20% - 21%. Is that normal? It seems to me that the stats collector is doing all the I/O (which would mean the stats collector is doing 46.1 megabytes /sec). Also, the I/O stats don't change hardly at all (except at night during backups which makes sense). They don't go up or down with user activity on the server - which makes me wonder a little bit. I have a feeling that if I just turned off Apache that the I/O stats wouldn't change. Which leads me to believe that the I/O is not query related - its stats collecting and autovacuuming related. Is that expected? It seems to me that the stats collector shouldn't be using that much I/O and CPU (and the autovacuum shouldn't be using that much CPU) - therefore something in my configuration must be messed up or could be changed somehow. But maybe I'm wrong - please let me know. I don't think my setup is necessarily slow. I just want to make it as efficient as possible and wanted to get some feedback to see if am setting things up right. I am also looking out into the future and seeing how much load I can put on this server before getting another one. If I can reduce the I/O and CPU that the stats collector and autovacuum are using without losing any functionality then I can put more load on the server. Again thanks for all the help. Scott Otis CIO / Lead Developer Intand www.intand.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3
Scott Otis wrote: Sorry about not responding to the whole list earlier - this is my first time posting to a mailing list. Would providing more information about the size and complexities of the databases help? I measure I/O stats with iostat - here is the command I use: iostat -d -x mfid0 -t 290 2 I tried looking at the man page for iostat but couldn't find anywhere how to determine what the stats are for sequential vs random - any help there? When using 'top -m io' the postgres stats collector process is constantly at 99% - 100%. When using 'top' the WCPU for the postgres stats collector and the autovacuum process are constantly at 20% - 21%. Is that normal? It seems to me that the stats collector is doing all the I/O (which would mean the stats collector is doing 46.1 megabytes /sec). Also, the I/O stats don't change hardly at all (except at night during backups which makes sense). They don't go up or down with user activity on the server - which makes me wonder a little bit. I have a feeling that if I just turned off Apache that the I/O stats wouldn't change. Which leads me to believe that the I/O is not query related - its stats collecting and autovacuuming related. Is that expected? It seems to me that the stats collector shouldn't be using that much I/O and CPU (and the autovacuum shouldn't be using that much CPU) - therefore something in my configuration must be messed up or could be changed somehow. But maybe I'm wrong - please let me know. I don't think my setup is necessarily slow. I just want to make it as efficient as possible and wanted to get some feedback to see if am setting things up right. I am also looking out into the future and seeing how much load I can put on this server before getting another one. If I can reduce the I/O and CPU that the stats collector and autovacuum are using without losing any functionality then I can put more load on the server. Again thanks for all the help. Scott Otis CIO / Lead Developer Intand www.intand.com When using 'top -m io' the postgres stats collector process is constantly at 99% - 100%. When using 'top' the WCPU for the postgres stats collector and the autovacuum process are constantly at 20% - 21%. Yeah, that sounds excessive. But my database gets 20 transactions a DAY, so, I have no experience with a busy box. You say its mostly selects, but do you have any triggers or anything that might update a table? Do you do inserts or updates to track traffic? What does: select * from pg_stat_activity look like? (I think vacuum will show up in there, right?) I'm curious if we can find the table autovacuum is working on, maybe that'll help pin it down. -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] PostgreSQL not using index for statement with group by
I am new to PostgreSQL and I am evaluating it for use as a data warehouse. I am really struggling to get a simple query to perform well. I have put the appropriate indexes on the table (at least they are appropriate from my use with several other RDBMS's). However, the query doesn't perform well, and I'm not sure how to get it to return in reasonable amount of time. Right now the query takes between 2 - 3 minutes to return. There are about 39 million rows in the table. Here is all of the information that I have. Please let me know if you I have done anything wrong or what needs to change. Thanks, Mark Table Definition: CREATE TABLE temp_inventory_fact ( item_id integer NOT NULL, date_id timestamp with time zone NOT NULL, "CBL_Key" integer NOT NULL, product_group_id integer NOT NULL, supplier_id numeric(19) NOT NULL, "Cost" numeric(19,9) NOT NULL, qty_on_hand numeric(19,9) NOT NULL, qty_allocated numeric(19,9) NOT NULL, qty_backordered numeric(19,9) NOT NULL, qty_on_po numeric(19,9) NOT NULL, qty_in_transit numeric(19,9) NOT NULL, qty_reserved numeric(19,9) NOT NULL, nonstock_id boolean NOT NULL ) WITH ( OIDS=FALSE ); Query: select product_group_id, SUM("Cost") FROM temp_inventory_Fact where product_group_id < 100 group by product_group_id order by product_group_id limit 50; Indexes on table: CREATE INDEX idx_temp_inventory_fact_product_cost ON temp_inventory_fact (product_group_id, "Cost"); CREATE INDEX idx_temp_inventory_fact_product ON temp_inventory_fact (product_group_id);
Re: [PERFORM] SAAS and MySQL
So as I understand, what you need is an online database program able to perform ETL tasks, that works in the cloud. There are a few companies out there able to perform what you are asking. What I could propose is a company called Talend. With Talend On Demand. This solution is based on the open source Talend Open Studio. You are offered a collaborative platform to work on, meaning that all your teams in different countries will be working on the same database on a secured web service. Go check it out on the website: http://www.talend.com/talend-on-demand/talend-on-demand.php . Hope this helps. Rstat wrote: > > > Hi all, > > We are a young, new company on the market. We are starting to open up new > markets in other countries (Europe). > > It somewhat is a challenge for us: we can't share our data and mysql > database between all our different services. So here is my question: do > you think it would be possible to find an ETL program that could work in > the cloud? > > It would not have to be too complex, but sturdy and working as a Software > as a Service. > > Thanks a lot for your help. > -- View this message in context: http://www.nabble.com/SAAS-and-MySQL-tp25258395p25276553.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow select times on select with xpath
I was receiving an error that an XML field does not support the various indexes available in postgresql. Is there an example of how to do this properly? Robert Haas wrote: > > On Wed, Sep 2, 2009 at 11:04 AM, astro77 wrote: >> >> I've got a table set up with an XML field that I would like to search on >> with >> 2.5 million records. The xml are serialized objects from my application >> which are too complex to break out into separate tables. I'm trying to >> run a >> query similar to this: >> >> SELECT serialized_object as outVal >> from object where >> ( >> >> array_to_string(xpath('/a:root/a:Identification/b:ObjectId/text()', >> serialized_object, >> ARRAY >> [ >> ARRAY['a', >> 'http://schemas.datacontract.org/2004/07/Objects'], >> ARRAY['b', >> 'http://schemas.datacontract.org/2004/07/Security'] >> >> ]), ' ') = 'fdc3da1f-060f-4c34-9c30-d9334d9272ae' >> >> ) >> limit 1000; >> >> I've also set up an index on the xpath query like this... >> >> CREATE INDEX concurrently >> idx_object_nodeid >> ON >> object >> USING >> btree( >> >> cast(xpath('/a:root/a:Identification/b:ObjectId/text()', >> serialized_object, >> ARRAY >> [ >> ARRAY['a', >> 'http://schemas.datacontract.org/2004/07/Objects'], >> ARRAY['b', >> 'http://schemas.datacontract.org/2004/07/Security'] >> >> ])as text[]) >> ); >> >> The query takes around 30 minutes to complete with or without the index >> in >> place and does not cache the query. Additionally the EXPLAIN say that the >> index is not being used. I've looked everywhere but can't seem to find >> solid >> info on how to achieve this. Any ideas would be greatly appreciated. > > Why do you have a cast in the index definition? > > ...Robert > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > > -- View this message in context: http://www.nabble.com/Slow-select-times-on-select-with-xpath-tp25259351p25283175.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3
On Thu, Sep 3, 2009 at 4:16 PM, Scott Otis wrote: > Sorry about not responding to the whole list earlier - this is my first time > posting to a mailing list. > > Would providing more information about the size and complexities of the > databases help? > > I measure I/O stats with iostat - here is the command I use: > > iostat -d -x mfid0 -t 290 2 > > I tried looking at the man page for iostat but couldn't find anywhere how to > determine what the stats are for sequential vs random - any help there? > > When using 'top -m io' the postgres stats collector process is constantly at > 99% - 100%. > > When using 'top' the WCPU for the postgres stats collector and the autovacuum > process are constantly at 20% - 21%. > > Is that normal? It seems to me that the stats collector is doing all the I/O > (which would mean the stats collector is doing 46.1 megabytes /sec). > > Also, the I/O stats don't change hardly at all (except at night during > backups which makes sense). They don't go up or down with user activity on > the server - which makes me wonder a little bit. I have a feeling that if I > just turned off Apache that the I/O stats wouldn't change. Which leads me to > believe that the I/O is not query related - its stats collecting and > autovacuuming related. Is that expected? > > It seems to me that the stats collector shouldn't be using that much I/O and > CPU (and the autovacuum shouldn't be using that much CPU) - therefore > something in my configuration must be messed up or could be changed somehow. > But maybe I'm wrong - please let me know. > > I don't think my setup is necessarily slow. I just want to make it as > efficient as possible and wanted to get some feedback to see if am setting > things up right. I am also looking out into the future and seeing how much > load I can put on this server before getting another one. If I can reduce > the I/O and CPU that the stats collector and autovacuum are using without > losing any functionality then I can put more load on the server. > > Again thanks for all the help. Can you post to the list all the uncommented lines from your postgresql.conf file and attach the results of "select * from pg_stat_all_tables" as an attachment? ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow select times on select with xpath
On Thu, Sep 3, 2009 at 4:06 PM, astro77 wrote: > I was receiving an error that an XML field does not support the various > indexes available in postgresql. Please post what happens when you try. > Is there an example of how to do this > properly? Not sure. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3
Robert Haas wrote: On Thu, Sep 3, 2009 at 4:16 PM, Scott Otis wrote: Sorry about not responding to the whole list earlier - this is my first time posting to a mailing list. Would providing more information about the size and complexities of the databases help? I measure I/O stats with iostat - here is the command I use: iostat -d -x mfid0 -t 290 2 I tried looking at the man page for iostat but couldn't find anywhere how to determine what the stats are for sequential vs random - any help there? When using 'top -m io' the postgres stats collector process is constantly at 99% - 100%. When using 'top' the WCPU for the postgres stats collector and the autovacuum process are constantly at 20% - 21%. Is that normal? It seems to me that the stats collector is doing all the I/O (which would mean the stats collector is doing 46.1 megabytes /sec). Also, the I/O stats don't change hardly at all (except at night during backups which makes sense). They don't go up or down with user activity on the server - which makes me wonder a little bit. I have a feeling that if I just turned off Apache that the I/O stats wouldn't change. Which leads me to believe that the I/O is not query related - its stats collecting and autovacuuming related. Is that expected? It seems to me that the stats collector shouldn't be using that much I/O and CPU (and the autovacuum shouldn't be using that much CPU) - therefore something in my configuration must be messed up or could be changed somehow. But maybe I'm wrong - please let me know. I don't think my setup is necessarily slow. I just want to make it as efficient as possible and wanted to get some feedback to see if am setting things up right. I am also looking out into the future and seeing how much load I can put on this server before getting another one. If I can reduce the I/O and CPU that the stats collector and autovacuum are using without losing any functionality then I can put more load on the server. Again thanks for all the help. Can you post to the list all the uncommented lines from your postgresql.conf file and attach the results of "select * from pg_stat_all_tables" as an attachment? ...Robert The first message he posted had this, and other info... Which is funny, because I almost asked the exact same question :-) FreeBSD 6.4 Apache 2.2 PostgreSQL 8.3.6 PHP 5.2.9 ~1500 databases w/ ~60 tables each Conf settings: listen_addresses = '*' max_connections = 600 ssl = on password_encryption = on shared_buffers = 1GB work_mem = 5MB maintenance_work_mem = 256MB max_fsm_pages = 280 max_fsm_relations = 16 synchronous_commit = off checkpoint_segments = 6 checkpoint_warning = 30s effective_cache_size = 1GB pg_stat_bgwriter: checkpoints_timed: 16660 checkpoints_req: 1309 buffers_checkpoint: 656346 buffers_clean: 120922 maxwritten_clean: 1 buffers_backend: 167623 buffers_alloc: 472802349 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3
On Thu, Sep 3, 2009 at 5:27 PM, Andy Colson wrote: > Robert Haas wrote: >> >> On Thu, Sep 3, 2009 at 4:16 PM, Scott Otis wrote: >>> >>> Sorry about not responding to the whole list earlier - this is my first >>> time posting to a mailing list. >>> >>> Would providing more information about the size and complexities of the >>> databases help? >>> >>> I measure I/O stats with iostat - here is the command I use: >>> >>> iostat -d -x mfid0 -t 290 2 >>> >>> I tried looking at the man page for iostat but couldn't find anywhere how >>> to determine what the stats are for sequential vs random - any help there? >>> >>> When using 'top -m io' the postgres stats collector process is constantly >>> at 99% - 100%. >>> >>> When using 'top' the WCPU for the postgres stats collector and the >>> autovacuum process are constantly at 20% - 21%. >>> >>> Is that normal? It seems to me that the stats collector is doing all the >>> I/O (which would mean the stats collector is doing 46.1 megabytes /sec). >>> >>> Also, the I/O stats don't change hardly at all (except at night during >>> backups which makes sense). They don't go up or down with user activity on >>> the server - which makes me wonder a little bit. I have a feeling that if I >>> just turned off Apache that the I/O stats wouldn't change. Which leads me >>> to believe that the I/O is not query related - its stats collecting and >>> autovacuuming related. Is that expected? >>> >>> It seems to me that the stats collector shouldn't be using that much I/O >>> and CPU (and the autovacuum shouldn't be using that much CPU) - therefore >>> something in my configuration must be messed up or could be changed somehow. >>> But maybe I'm wrong - please let me know. >>> >>> I don't think my setup is necessarily slow. I just want to make it as >>> efficient as possible and wanted to get some feedback to see if am setting >>> things up right. I am also looking out into the future and seeing how much >>> load I can put on this server before getting another one. If I can reduce >>> the I/O and CPU that the stats collector and autovacuum are using without >>> losing any functionality then I can put more load on the server. >>> >>> Again thanks for all the help. >> >> Can you post to the list all the uncommented lines from your >> postgresql.conf file and attach the results of "select * from >> pg_stat_all_tables" as an attachment? >> >> ...Robert >> > > The first message he posted had this, and other info... Which is funny, > because I almost asked the exact same question :-) > > > FreeBSD 6.4 > Apache 2.2 > PostgreSQL 8.3.6 > PHP 5.2.9 > > > ~1500 databases w/ ~60 tables each > > > Conf settings: > > listen_addresses = '*' > max_connections = 600 > ssl = on > password_encryption = on > shared_buffers = 1GB > work_mem = 5MB > maintenance_work_mem = 256MB > max_fsm_pages = 280 > max_fsm_relations = 16 > synchronous_commit = off > checkpoint_segments = 6 > checkpoint_warning = 30s > effective_cache_size = 1GB > > > pg_stat_bgwriter: > > checkpoints_timed: 16660 > checkpoints_req: 1309 > buffers_checkpoint: 656346 > buffers_clean: 120922 > maxwritten_clean: 1 > buffers_backend: 167623 > buffers_alloc: 472802349 You're right - I missed that. But I still want to see pg_stat_all_tables. I wonder if it would be worth attaching strace to the stats collector and trying to get some idea what it's doing (if FreeBSD has strace...). Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3
2009/9/3 Scott Otis : > Sorry about not responding to the whole list earlier - this is my first time > posting to a mailing list. > > Would providing more information about the size and complexities of the > databases help? > > I measure I/O stats with iostat - here is the command I use: > > iostat -d -x mfid0 -t 290 2 Simply do "iostat mfid0 1" and post 10 lines of its output. > When using 'top -m io' the postgres stats collector process is constantly at > 99% - 100%. In itself it doesn't mean much. The number of IOs is important. > I don't think my setup is necessarily slow. I just want to make it as > efficient as possible and wanted to get some feedback to see if am setting > things up right. I am also looking out into the future and seeing how much > load I can put on this server before getting another one. If I can reduce > the I/O and CPU that the stats collector and autovacuum are using without > losing any functionality then I can put more load on the server. In general it's tricky to optimize for unknown targets - if your performance is OK right now, you should leave it alone. On the other hand, your diagnosis of stats collector doing 46 MB/s points to something very abnormal. You should probably post your entire postgresql.conf. -- f+rEnSIBITAhITAhLR1nM9F4cIs5KJrhbcsVtUIt7K1MhWJy1A== -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL not using index for statement with group by
Mark Starkman wrote: > I'm not sure how to get it to return in > reasonable amount of time. Some more information could help. What version of PostgreSQL is this? Please give an overview of the hardware and OS. Please show your postgresql.conf file, excluding comments. Please run your query with EXPLAIN ANALYZE in front, so we can see the execution plan, with cost estimates compared to actual information. If the the plan indicates a sequential scan, and you think an indexed scan may be faster, you might be able to coerce it into the indexed plan for diagnostic purposes by running this on the connection before an EXPLAIN ANALYZE run: set enable_seqscan = off; You don't want to leave it off, or try to use that in production, but it might be useful in figuring out what's going on. That might be enough to diagnose the issue. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3
> Simply do "iostat mfid0 1" and post 10 lines of its output. tty mfid0 cpu tin tout KB/t tps MB/s us ni sy in id 0 152 108.54 335 35.51 43 0 30 1 27 0 525 85.73 759 63.55 14 0 12 0 74 0 86 67.72 520 34.39 13 0 12 0 75 0 86 86.89 746 63.26 12 0 12 0 76 0 86 70.09 594 40.65 13 0 11 0 76 0 86 78.50 756 57.99 13 0 10 0 77 0 351 81.46 774 61.61 12 0 11 0 77 0 86 63.87 621 38.72 9 0 8 0 83 0 86 80.87 821 64.86 8 0 8 0 83 0 86 58.78 637 36.55 11 0 11 0 77 Scott -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3
> Can you post to the list all the uncommented lines from your postgresql.conf file and attach the results of "select * from pg_stat_all_tables" as an attachment? I attached a CSV of "select * from pg_stat_all_tables" from one of our more heavily used databases. Note: I turned off stats collection and autvacuuming a couple days ago to see what it would do and then restarted postgres - I turned those back on this morning to that is why there aren't more autovacuumed and autoanalyzed tables. Sorry if this is a little verbose - I didn't want to leave anything out. Uncommented lines from Postgresql.conf: listen_addresses = '*' max_connections = 600 ssl = on password_encryption = on shared_buffers = 1GB work_mem = 5MB maintenance_work_mem = 256MB max_fsm_pages = 280 max_fsm_relations = 16 synchronous_commit = off checkpoint_segments = 6 checkpoint_warning = 30s effective_cache_size = 1GB log_destination = 'stderr' logging_collector = on log_directory = '/var/log/pgsql' log_filename = '%m%d%y_%H%M%S-pgsql.log' log_rotation_age = 1d log_rotation_size = 10MB log_min_messages = warning log_error_verbosity = default log_min_error_statement = warning silent_mode = on log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d ' log_temp_files = 0 track_activities = on track_counts = on update_process_title = off log_parser_stats = off log_planner_stats = off log_executor_stats = off log_statement_stats = off autovacuum = on datestyle = 'iso, mdy' lc_messages = 'C' lc_monetary = 'C' lc_numeric = 'C' lc_time = 'C' default_text_search_config = 'pg_catalog.english' pg_stat_all_tables2.csv Description: pg_stat_all_tables2.csv -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SAAS and MySQL
On Thu, Sep 3, 2009 at 9:52 AM, Tguru wrote: > > > So as I understand, what you need is an online database program able to > perform ETL tasks, that works in the cloud. > > There are a few companies out there able to perform what you are asking. > What I could propose is a company called Talend. With Talend On Demand. > > This solution is based on the open source Talend Open Studio. You are > offered a collaborative platform to work on, meaning that all your teams in > different countries will be working on the same database on a secured web > service. > > Go check it out on the website: > http://www.talend.com/talend-on-demand/talend-on-demand.php . Hope this > helps. > > > > > > > > > Rstat wrote: >> >> >> Hi all, >> >> We are a young, new company on the market. We are starting to open up new >> markets in other countries (Europe). >> >> It somewhat is a challenge for us: we can't share our data and mysql >> database between all our different services. So here is my question: do >> you think it would be possible to find an ETL program that could work in >> the cloud? >> >> It would not have to be too complex, but sturdy and working as a Software >> as a Service. >> >> Thanks a lot for your help. >> > huh? merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL not using index for statement with group by
On Thu, Sep 3, 2009 at 7:33 AM, Mark Starkman wrote: > I am new to PostgreSQL and I am evaluating it for use as a data warehouse. > I am really struggling to get a simple query to perform well. I have put > the appropriate indexes on the table (at least they are appropriate from my > use with several other RDBMS’s). Ok, first things first. Pgsql isn't like most other dbms. It's indexes do not contain visibility info, which means that if the db were to use the indexes to look up entries in a table, it still has to go back to the table to look those values up to see if they are visible to the current transation. So, if you're retrieving a decent percentage of the table, it's cheaper to just hit the table. Note that this makes PostgreSQL poorly suited for very wide tables. Generally the trick to making large accesses run fast in pgsql is to increase work_mem. But some queries just aren't efficient in pgsql that can be efficient in other dbs. Possibly clustering on product_group_id would help. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3
On Thu, Sep 3, 2009 at 7:11 PM, Scott Otis wrote: >> Can you post to the list all the uncommented lines from your > postgresql.conf file and attach the results of "select * from > pg_stat_all_tables" as an attachment? > > I attached a CSV of "select * from pg_stat_all_tables" from one of our > more heavily used databases. Note: I turned off stats collection and > autvacuuming a couple days ago to see what it would do and then > restarted postgres - I turned those back on this morning to that is why > there aren't more autovacuumed and autoanalyzed tables. Do you by any chance have a bazillion databases in this cluster? Can you do these? select sum(1) from pg_database; select pg_relation_size('pg_database'); select sum(pg_column_size(d.*)) from pg_database; ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3
Robert Haas : > Do you by any chance have a bazillion databases in this cluster? Can > you do these? > select sum(1) from pg_database; 1555 > select pg_relation_size('pg_database'); 221184 > select sum(pg_column_size(d.*)) from pg_database; That gave me: ERROR: missing FROM-clause entry for table "d" LINE 1: select sum(pg_column_size(d.*)) from pg_database; So I did this: select sum(pg_column_size(d.*)) from pg_database as d; and got: 192910 Also did this: select sum(pg_database_size(datname)) from pg_database; and got: 13329800428 (12.4GB) Scott
Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3
>> > max_fsm_pages = 280 >> > max_fsm_relations = 16 > >> What does the last couple of lines from a 'vacuum analyze verbose' >> say? I have max_fsm_pages = 400 and max_fsm_relations = 1500. > >> You can also try to lower random_page_cost to a lower value like 1.2 >> but I doubt this will help in your case. > > last couple lines from 'vacuumdb -a -v -z': > > INFO: free space map contains 114754 pages in 42148 relations > DETAIL: A total of 734736 page slots are in use (including overhead). v--- > 734736 page slots are required to track all free space. ^--- > Current limits are: 280 page slots, 16 relations, using 26810 kB. You can lower your max_fsm_pages setting to a number above 'xyz page slots required ...' to 100 and fsm-relations to like 5. -- regards Claus When lenity and cruelty play for a kingdom, the gentler gamester is the soonest winner. Shakespeare -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3
Claus Guttesen kome...@gmail.com: > > Would love to get some advice on how to change my conf settings / setup to > > get better I/O performance. > > > > Server Specs: > > > > 2x Intel Xeon Quad Core (@2 Ghz - Clovertown,L5335) > > 4GB RAM > > 4x Seagate 73GB SAS HDD 10k RPM - in RAID ( stripped and mirrored ) > > > > FreeBSD 6.4 > > Apache 2.2 > > PostgreSQL 8.3.6 > > PHP 5.2.9 > > > > ~1500 databases w/ ~60 tables each > > > > max_connections = 600 > > shared_buffers = 1GB > On a dual-core HP DL380 with 16 GB ram I have set shared_buffers at > 512 MB for 900 max_connections. Far the largest table have approx. 120 > mill. records. You could try to lower shared_buffers. > > max_fsm_pages = 280 > > max_fsm_relations = 16 > What does the last couple of lines from a 'vacuum analyze verbose' > say? I have max_fsm_pages = 400 and max_fsm_relations = 1500. > You can also try to lower random_page_cost to a lower value like 1.2 > but I doubt this will help in your case. last couple lines from 'vacuumdb -a -v -z': INFO: free space map contains 114754 pages in 42148 relations DETAIL: A total of 734736 page slots are in use (including overhead). 734736 page slots are required to track all free space. Current limits are: 280 page slots, 16 relations, using 26810 kB. Scott