Re: [PERFORM] disable archiving
On Jul 23, 7:24 pm, [EMAIL PROTECTED] (Paul van den Bogaard) wrote: > the manual somewhere states "... if archiving is enabled..." To me > this implies that archiving can be disabled. However I cannot find > the parameter to use to get this result. Or should I enable archiving > and use a backup script like > > #!/usr/bin/bash > exit 0 > > Would appreciate a hint. And yes I know I put my database in danger > etc. This is for some benchmarks where I do not want the overhead of > archiving. Jus a file system that will not fill with zillions of > these 16MB WAL files ;^) > > Thanks > Paul. > > ---(end of broadcast)--- > TIP 1: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to [EMAIL PROTECTED] so that your >message can get through to the mailing list cleanly Is it normal to spoil other threads? or is it a bug? If it is not a bug, please change the subject of the topic back to what it was! With best regards, Valentine Gogichashvili ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] multicolumn index column order
On 7/24/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: valgog <[EMAIL PROTECTED]> wrote .. > On Jul 23, 7:00 pm, [EMAIL PROTECTED] (Tom Lane) wrote: > > valgog <[EMAIL PROTECTED]> writes: > > > how to build an multicolumn index with one column order ASCENDING and > > > another column order DESCENDING? > > > > Use 8.3 ;-) > > > > In existing releases you could fake it with a custom reverse-sorting > > operator class, but it's a pain in the neck to create one. I've often gotten what I want by using a calculated index on (f1, -f2). ORDER BY will take an expression, e.g. ORDER BY f1, -f2. Simpler than a custom operator. Yes, this is true, but I do now know how to make text order be reversible? There is no - (minus) operator for text value. By now it is not a problem for me, but theoretically I do not see other chance to reverse text fields order...
Re: [PERFORM] multicolumn index column order
valgog <[EMAIL PROTECTED]> wrote .. > On Jul 23, 7:00 pm, [EMAIL PROTECTED] (Tom Lane) wrote: > > valgog <[EMAIL PROTECTED]> writes: > > > how to build an multicolumn index with one column order ASCENDING and > > > another column order DESCENDING? > > > > Use 8.3 ;-) > > > > In existing releases you could fake it with a custom reverse-sorting > > operator class, but it's a pain in the neck to create one. I've often gotten what I want by using a calculated index on (f1, -f2). ORDER BY will take an expression, e.g. ORDER BY f1, -f2. Simpler than a custom operator. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Query performance issue
Jonathan Gray wrote: We’re experiencing a query performance problem related to the planner and its ability to perform a specific type of merge. We have created a test case (as attached, or here: http://www3.streamy.com/postgres/indextest.sql) which involves a hypothetical customer ordering system, with customers, orders, and customer groups. If we want to retrieve a single customers 10 most recent orders, sorted by date, we can use a double index on (customer,date); Postgres’s query planner will use the double index with a backwards index scan on the second indexed column (date). However, if we want to retrieve a “customer class’s” 10 most recent orders, sorted by date, we are not able to get Postgres to use double indexes. You don't have any indexes on the 'customerclass' table. Creating a foreign key doesn't create an index, you need to do that separately. Try create index cc_customerid_class on indextest.customerclass(classid, customerid); -- Postgresql & php tutorials http://www.designmagick.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Query performance issue
Chris wrote: Jonathan Gray wrote: We’re experiencing a query performance problem related to the planner and its ability to perform a specific type of merge. We have created a test case (as attached, or here: http://www3.streamy.com/postgres/indextest.sql) which involves a hypothetical customer ordering system, with customers, orders, and customer groups. If we want to retrieve a single customers 10 most recent orders, sorted by date, we can use a double index on (customer,date); Postgres’s query planner will use the double index with a backwards index scan on the second indexed column (date). However, if we want to retrieve a “customer class’s” 10 most recent orders, sorted by date, we are not able to get Postgres to use double indexes. You don't have any indexes on the 'customerclass' table. Creating a foreign key doesn't create an index, you need to do that separately. Try create index cc_customerid_class on indextest.customerclass(classid, customerid); It could also be that since you don't have very much data (10,000) rows - postgres is ignoring the indexes because it'll be quicker to scan the tables. If you bump it up to say 100k rows, what happens? -- Postgresql & php tutorials http://www.designmagick.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Query performance issue
Chris, Creating indexes on the customerclass table does speed up the queries but still does not create the plan we are looking for (using the double index with a backward index scan on the orders table). The plans we now get, with times on par or slightly better than with the plpgsql hack, are: EXPLAIN ANALYZE SELECT o.orderid,o.orderstamp FROM indextest.orders o INNER JOIN indextest.customerclass cc ON (cc.classid = 2) WHERE o.customerid = cc.customerid ORDER BY o.orderstamp DESC LIMIT 5; QUERY PLAN Limit (cost=0.00..176.65 rows=5 width=12) (actual time=0.930..3.675 rows=5 loops=1) -> Nested Loop (cost=0.00..46388.80 rows=1313 width=12) (actual time=0.927..3.664 rows=5 loops=1) -> Index Scan Backward using orders_orderstamp_idx on orders o (cost=0.00..6225.26 rows=141001 width=16) (actual time=0.015..0.957 rows=433 loops=1) -> Index Scan using customerclass_customerid_idx on customerclass cc (cost=0.00..0.27 rows=1 width=4) (actual time=0.004..0.004 rows=0 loops=433) Index Cond: (o.customerid = cc.customerid) Filter: (classid = 2) And EXPLAIN ANALYZE SELECT o.orderid,o.orderstamp FROM indextest.orders o INNER JOIN indextest.customerclass cc ON (cc.classid = 2) WHERE o.customerid = cc.customerid ORDER BY o.orderstamp DESC LIMIT 100; QUERY PLAN --- Limit (cost=1978.80..1979.05 rows=100 width=12) (actual time=6.167..6.448 rows=100 loops=1) -> Sort (cost=1978.80..1982.09 rows=1313 width=12) (actual time=6.165..6.268 rows=100 loops=1) Sort Key: o.orderstamp -> Nested Loop (cost=3.99..1910.80 rows=1313 width=12) (actual time=0.059..4.576 rows=939 loops=1) -> Bitmap Heap Scan on customerclass cc (cost=3.99..55.16 rows=95 width=4) (actual time=0.045..0.194 rows=95 loops=1) Recheck Cond: (classid = 2) -> Bitmap Index Scan on customerclass_classid_idx (cost=0.00..3.96 rows=95 width=0) (actual time=0.032..0.032 rows=95 loops=1) Index Cond: (classid = 2) -> Index Scan using orders_customerid_idx on orders o (cost=0.00..19.35 rows=15 width=16) (actual time=0.006..0.025 rows=10 loops=95) Index Cond: (o.customerid = cc.customerid) As I said, this is a hypothetical test case we have arrived at that describes our situation as best as we can given a simple case. We're interested in potential issues with the approach, why postgres would not attempt something like it, and how we might go about implementing it ourselves at a lower level than we currently have (in SPI, libpq, etc). If it could be generalized then we could use it in cases where we aren't pulling from just one table (the orders table) but rather trying to merge, in sorted order, results from different conditions on different tables. Right now we use something like the plpgsql or plpythonu functions in the example and they outperform our regular SQL queries by a fairly significant margin. An example might be: SELECT * FROM ( (SELECT orderid,stamp FROM indextest.orders_usa WHERE customerid = ORDER BY stamp DESC LIMIT 5) UNION (SELECT orderid,stamp FROM indextest.orders_can WHERE customerid = ORDER BY stamp DESC LIMIT 5) ) as x ORDER BY x.stamp DESC Again, that's a general example but some of my queries contain between 5 and 10 different sorted joins of this kind and it would be helpful to have something internal in postgres to efficiently handle it (do something just like the above query but not have to do the full LIMIT 5 for each set, some kind of in order merge/heap join?) Jonathan Gray -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chris Sent: Tuesday, July 24, 2007 1:51 AM To: Jonathan Gray Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Query performance issue Chris wrote: > Jonathan Gray wrote: >> We're experiencing a query performance problem related to the planner >> and its ability to perform a specific type of merge. >> >> >> >> We have created a test case (as attached, or here: >> http://www3.streamy.com/postgres/indextest.sql) which involves a >> hypothetical customer ordering system, with customers, orders, and >> customer groups. >> >> >> >> If we want to retrieve a single customers 10 most recent orders, >> sorted by date, we can use a double index on (customer,date); >> Postgres's query planner will use the double index with a backwards >> index scan on the second indexed column (date). >> >> >> >> However, if we want to retrieve a "customer class's" 10 m
Re: [PERFORM] Query performance issue
Jonathan Gray wrote: Chris, Creating indexes on the customerclass table does speed up the queries but still does not create the plan we are looking for (using the double index with a backward index scan on the orders table). Stupid question - why is that particular plan your "goal" plan? The plans we now get, with times on par or slightly better than with the plpgsql hack, are: EXPLAIN ANALYZE SELECT o.orderid,o.orderstamp FROM indextest.orders o INNER JOIN indextest.customerclass cc ON (cc.classid = 2) WHERE o.customerid = cc.customerid ORDER BY o.orderstamp DESC LIMIT 5; Didn't notice this before... Shouldn't this be: INNER JOIN indextest.customerclass cc ON (o.customerid = cc.customerid) WHERE cc.classid = 2 ie join on the common field not the classid one which doesn't appear in the 2nd table? As I said, this is a hypothetical test case we have arrived at that describes our situation as best as we can given a simple case. We're interested in potential issues with the approach, why postgres would not attempt something like it, and how we might go about implementing it ourselves at a lower level than we currently have (in SPI, libpq, etc). If it could be generalized then we could use it in cases where we aren't pulling from just one table (the orders table) but rather trying to merge, in sorted order, results from different conditions on different tables. Right now we use something like the plpgsql or plpythonu functions in the example and they outperform our regular SQL queries by a fairly significant margin. I'm sure if you posted the queries you are running with relevant info you'd get some help ;) -- Postgresql & php tutorials http://www.designmagick.com/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Query performance issue
That particular plan is our goal because we've "hacked" it together to perform better than the normal sql plans. Analytically it makes sense to approach this particular problem in this way because it is relatively invariant to the distributions and sizes of the tables (with only having to deal with increased index size). Also, changing around the query doesn't change the query plan at all. The planner is intelligent enough to figure out what it really needs to join on despite my poor query writing. I originally had it this way to ensure my (customerid,orderstamp) conditions were in the correct order but again appears to not matter. I will try to get a more complex/sophisticated test case running. I'm not able to post my actual structure or queries but I'll try to produce a better example of the other (multiple table) case tomorrow. Thanks. Jonathan Gray -Original Message- From: Chris [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 24, 2007 2:36 AM To: Jonathan Gray Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Query performance issue Jonathan Gray wrote: > Chris, > > Creating indexes on the customerclass table does speed up the queries but > still does not create the plan we are looking for (using the double index > with a backward index scan on the orders table). Stupid question - why is that particular plan your "goal" plan? > The plans we now get, with times on par or slightly better than with the > plpgsql hack, are: > > EXPLAIN ANALYZE > SELECT o.orderid,o.orderstamp FROM indextest.orders o > INNER JOIN indextest.customerclass cc ON (cc.classid = 2) > WHERE o.customerid = cc.customerid ORDER BY o.orderstamp DESC LIMIT 5; Didn't notice this before... Shouldn't this be: INNER JOIN indextest.customerclass cc ON (o.customerid = cc.customerid) WHERE cc.classid = 2 ie join on the common field not the classid one which doesn't appear in the 2nd table? > As I said, this is a hypothetical test case we have arrived at that > describes our situation as best as we can given a simple case. We're > interested in potential issues with the approach, why postgres would not > attempt something like it, and how we might go about implementing it > ourselves at a lower level than we currently have (in SPI, libpq, etc). > > If it could be generalized then we could use it in cases where we aren't > pulling from just one table (the orders table) but rather trying to merge, > in sorted order, results from different conditions on different tables. > Right now we use something like the plpgsql or plpythonu functions in the > example and they outperform our regular SQL queries by a fairly significant > margin. I'm sure if you posted the queries you are running with relevant info you'd get some help ;) -- Postgresql & php tutorials http://www.designmagick.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...
Hello, thank you for all your comments and recommendations. I'm aware that the conditions for this benchmark are not ideal, mostly due to the lack of time to prepare it. We will also need an additional benchmark on a less powerful - more realistic - server to better understand the scability of our application. Our application is based on java and is generating dynamic reports from log files content. Dynamic means here that a repor will be calculated from the postgres data the first time it is requested (it will then be cached). Java is used to drive the data preparation and to handle/generate the reports requests. This is much more an OLAP system then an OLTP, at least for our performance concern. Data preparation: 1) parsing the log files with a heavy use of perl (regular expressions) to generate csv files. Prepared statements also maintain reference tables in the DB. Postgres performance is not an issue for this first step. 2) loading the csv files with COPY. As around 70% of the data to load come in a single daily table, we don't allow concurrent jobs for this step. We have between a few and a few hundreds files to load into a single table; they are processed one after the other. A primary key is always defined; for the case when the required indexes are alreay built and when the new data are above a given size, we are using a "shadow" table instead (without the indexes) , build the index after the import and then replace the live table with the shadow one. For example, we a have a table of 13 GB + 11 GB indexes (5 pieces). Performances : a) is there an "ideal" size to consider for our csv files (100 x 10 MB or better 1 x 1GB ?) b) maintenance_work_mem: I'll use around 1 GB as recommended by Stefan 3) Data agggregation. This is the heaviest part for Postgres. On our current system some queries need above one hour, with phases of around 100% cpu use, alterning with times of heavy i/o load when temporary results are written/read to the plate (pgsql_tmp). During the aggregation, other postgres activities are low (at least should be) as this should take place at night. Currently we have a locking mechanism to avoid having more than one of such queries running concurently. This may be to strict for the benchmark server but better reflect our current hardware capabilities. Performances : Here we should favorise a single huge transaction and consider a low probability to have another transaction requiring large sort space. Considering this, is it reasonable to define work_mem being 3GB (I guess I should raise this parameter dynamically before running the aggregation queries) 4) Queries (report generation) We have only few requests which are not satisfying while requiring large sort operations. The data are structured in different aggregation levels (minutes, hours, days) with logical time based partitions in oder to limit the data size to compute for a given report. Moreover we can scale our infrastrucure while using different or dedicated Postgres servers for different customers. Smaller customers may share a same instance, each of them having its own schema (The lock mechanism for large aggregations apply to a whole Postgres instance, not to a single customer) . The benchmark will help us to plan such distribution. During the benchmark, we will probably not have more than 50 not idle connections simultaneously. It is a bit too early for us to fine tune this part. The benchmark will mainly focus on the steps 1 to 3 During the benchmark, the Db will reach a size of about 400 GB, simulating 3 different customers, also with data quite equally splitted in 3 scheemas. I will post our configuration(s) later on. Thanks again for all your valuable input. Marc Mamin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Table Statistics with pgAdmin III
I have installed pgAdmin III 1.6. In the tool when you click on a particular table you can select a tab called "Statistics". This tab has all kinds of info on your table. For some reason the only info I see is for table size, toast table size and indexes size. Is there a reason that the other 15 fields have zeros in them? I was thinking that maybe I needed to turn on a setting within my database in order to get statistics reported. Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu
Re: [PERFORM] Table Statistics with pgAdmin III
Campbell, Lance a écrit : I have installed pgAdmin III 1.6. In the tool when you click on a particular table you can select a tab called “Statistics”. This tab has all kinds of info on your table. For some reason the only info I see is for table size, toast table size and indexes size. Is there a reason that the other 15 fields have zeros in them? I was thinking that maybe I needed to turn on a setting within my database in order to get statistics reported. it seems that the module pgstattuple is needed -- Jean-Max Reymond CKR Solutions http://www.ckr-solutions.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Table Statistics with pgAdmin III
> --- Original Message --- > From: Jean-Max Reymond <[EMAIL PROTECTED]> > To: pgsql-performance@postgresql.org > Sent: 24/07/07, 18:23:53 > Subject: Re: [PERFORM] Table Statistics with pgAdmin III > > Campbell, Lance a écrit : > > I have installed pgAdmin III 1.6. In the tool when you click on a > > particular table you can select a tab called “Statistics”. This tab has > > all kinds of info on your table. For some reason the only info I see is > > for table size, toast table size and indexes size. Is there a reason > > that the other 15 fields have zeros in them? I was thinking that maybe > > I needed to turn on a setting within my database in order to get > > statistics reported. > > it seems that the module pgstattuple is needed That'll allow you to see extra stats in 1.8, but won't alter what you already see, in fact 1.6 won't use it at all. What values are at zero? Regards, Dave ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] index over timestamp not being used
Hi all, I've got the following two tables running on postgresql 8.1.4 transactions Column |Type | Modifiers --+-+--- transaction_id| character varying(32) | not null user_id | bigint | not null timestamp_in | timestamp without time zone | default now() type_id | integer | technology_id | integer | Indexes: "pk_phusrtrans_transid" PRIMARY KEY, btree (transaction_id) "idx_phusrtrans_paytyptech" btree (type_id, technology_id) "idx_putrnsctns_tstampin" btree (timestamp_in) statistics Column |Type |Modifiers --+-+--- statistic_id | bigint | not null duration | bigint | transaction_id| character varying(32) | Indexes: "pk_phstat_statid" PRIMARY KEY, btree (statistic_id) "idx_phstat_transid" btree (transaction_id) the idea is to have a summary of how many transactions, duration, and type for every date. To do so, I've done the following query: SELECT count(t.transaction_id) AS num_transactions , SUM(s.duration) AS duration , date(t.timestamp_in) as date , t.type_id FROM transactions t LEFT OUTER JOIN statistics s ON t.transaction_id = s.transaction_id WHERE t.timestamp_in >= to_timestamp('20070101', 'MMDD') GROUP BY date, t.type_id; I think this could be speed up if the index idx_putrnsctns_tstampin (index over the timestamp) could be used, but I haven't been able to do it. Any suggestion? Thanks all -- Arnau ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Table Statistics with pgAdmin III
All of the fields are zero except for the three I listed in my posting. Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dave Page Sent: Tuesday, July 24, 2007 12:50 PM To: Jean-Max Reymond Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Table Statistics with pgAdmin III > --- Original Message --- > From: Jean-Max Reymond <[EMAIL PROTECTED]> > To: pgsql-performance@postgresql.org > Sent: 24/07/07, 18:23:53 > Subject: Re: [PERFORM] Table Statistics with pgAdmin III > > Campbell, Lance a écrit : > > I have installed pgAdmin III 1.6. In the tool when you click on a > > particular table you can select a tab called "Statistics". This tab has > > all kinds of info on your table. For some reason the only info I see is > > for table size, toast table size and indexes size. Is there a reason > > that the other 15 fields have zeros in them? I was thinking that maybe > > I needed to turn on a setting within my database in order to get > > statistics reported. > > it seems that the module pgstattuple is needed That'll allow you to see extra stats in 1.8, but won't alter what you already see, in fact 1.6 won't use it at all. What values are at zero? Regards, Dave ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] index over timestamp not being used
Arnau <[EMAIL PROTECTED]> writes: > timestamp_in | timestamp without time zone | default now() > SELECT ... > FROM > transactions t > LEFT OUTER JOIN statistics s ON t.transaction_id = s.transaction_id > WHERE > t.timestamp_in >= to_timestamp('20070101', 'MMDD') > GROUP BY date, t.type_id; to_timestamp() produces timestamp *with* timezone, so your WHERE query is effectively t.timestamp_in::timestamptz >= to_timestamp('20070101', 'MMDD') which doesn't match the index. The first question you should ask yourself is whether you picked the right datatype for the column. IMHO timestamp with tz is the more appropriate choice in the majority of cases. If you do want to stick with timestamp without tz, you'll need to cast the result of to_timestamp to that. Alternatively, do you really need to_timestamp at all? The standard timestamp input routine won't have any problem with that format: t.timestamp_in >= '20070101' regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] index over timestamp not being used
Hi Tom, Alternatively, do you really need to_timestamp at all? The standard timestamp input routine won't have any problem with that format: t.timestamp_in >= '20070101' This is always I think I'm worried, what happens if one day the internal format in which the DB stores the date/timestamps changes. I mean, if instead of being stored as MMDD is stored as DDMM, should we have to change all the queries? I thought the to_char/to_date/to_timestamp functions were intented for this purposes -- Arnau ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] index over timestamp not being used
Arnau <[EMAIL PROTECTED]> writes: >> Alternatively, do you really need to_timestamp at all? The standard >> timestamp input routine won't have any problem with that format: >> t.timestamp_in >= '20070101' > This is always I think I'm worried, what happens if one day the internal > format in which the DB stores the date/timestamps changes. I mean, if > instead of being stored as MMDD is stored as DDMM, should we > have to change all the queries? You are confusing internal storage format with the external representation. > I thought the > to_char/to_date/to_timestamp functions were intented for this purposes No, they're intended for dealing with wacky formats that the regular input/output routines can't understand or produce. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] 8.2 -> 8.3 performance numbers
On 7/20/07, Josh Berkus <[EMAIL PROTECTED]> wrote: Jim, > Has anyone benchmarked HEAD against 8.2? I'd like some numbers to use in > my OSCon lightning talk. Numbers for both with and without HOT would be > even better (I know we've got HOT-specific benchmarks, but I want > complete 8.2 -> 8.3 numbers). We've done it on TPCE, which is a hard benchmark for PostgreSQL. On that it's +9% without HOT and +13% with HOT. I think SpecJ would show a greater difference, but we're still focussed on benchmarks we can publish (i.e. 8.2.4) right now. Are there any industry standard benchmarks that you know of which PostgreSQL excels at? merlin ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Performance issue with 8.2.3 - "C" application
I've got an interesting issue here that I'm running into with 8.2.3 This is an application that has run quite well for a long time, and has been operating without significant changes (other than recompilation) since back in the early 7.x Postgres days. But now we're seeing a LOT more load than we used to with it, and suddenly, we're seeing odd performance issues. It APPEARS that the problem isn't query performance per-se. That is, while I can find a few processes here and there in a run state when I look with a PS, I don't see them consistently churning. But here's the query that has a habit of taking the most time select forum, * from post where toppost = 1 and (replied > (select lastview from forumlog where login='theuser' and forum=post.forum and number is null)) is not false AND (replied > (select lastview from forumlog where login='theuser' and forum=post.forum and number=post.number)) is not f alse order by pinned desc, replied desc offset 0 limit 20 Now the numeric and "login" fields may change; when I plug it into explain what I get back is: QUERY PLAN --- Limit (cost=57270.22..57270.27 rows=20 width=757) -> Sort (cost=57270.22..57270.50 rows=113 width=757) Sort Key: pinned, replied -> Index Scan using post_top on post (cost=0.00..57266.37 rows=113 width=757) Index Cond: (toppost = 1) Filter: (((replied > (subplan)) IS NOT FALSE) AND ((replied > (subplan)) IS NOT FALSE)) SubPlan -> Index Scan using forumlog_composite on forumlog (cost=0.00..8.29 rows=1 width Index Cond: (("login" = 'theuser'::text) AND (forum = $0) AND (number = $1)) -> Bitmap Heap Scan on forumlog (cost=4.39..47.61 rows=1 width=8) Recheck Cond: (("login" = 'theuser'::text) AND (forum = $0)) Filter: (number IS NULL) -> Bitmap Index Scan on forumlog_composite (cost=0.00..4.39 rows=12 width=0) Index Cond: (("login" = 'theuser'::text) AND (forum = $0)) (14 rows) And indeed, it returns a fairly reasonable number of rows. This takes a second or two to return - not all that bad - although this is one that people hit a LOT. One thing that DOES bother me is this line from the EXPLAIN output: -> Index Scan using post_top on post (cost=0.00..57266.53 rows=113 width=757) This is indexed using: "post_top" btree (toppost) Ain't nothing fancy there. So how come the planner thinks this is going to take that long?!? More interesting, if I do a simple query on that line, I get ticker=> explain select forum from post where toppost='1'; QUERY PLAN --- Index Scan using post_top on post (cost=0.00..632.03 rows=1013 width=11) Index Cond: (toppost = 1) Hmm; that's a bit more reasonable. So what's up with the above line? What I'm seeing is that as concurrency increases, I see the CPU load spike. Disk I/O is low to moderate at less than 10% of maximum according to systat -vm, no swap in use, 300mb dedicated to shared memory buffers for Postgres (machine has 1GB of RAM and is a P4/3G/HT running FreeBSD 6.2-STABLE) It does not swap at all, so it does not appear I've got a problem with running out of physical memory. shmem is pinned to physical memory via the sysctl tuning parameter to prevent page table thrashing. However, load goes up substantially and under moderate to high concurrency gets into the 4-5 range with response going somewhat to crap. The application is still usable, but its not "crisp". If I do a "ps" during times that performance is particularly bad, I don't see any particular overrepresentation of this query .vs. others (I have the application doing a "setproctitle" so I know what command - and thus what sets of queries - it is executing.) Not sure where to start here. It appears that I'm CPU limited and the problem may be that this is a web-served application that must connect to the Postgres backend for each transaction, perform its queries, and then close the connection down - in other words the load may be coming not from Postgres but rather from places I can't fix at the application layer (e.g. fork() overhead, etc). The DBMS and Apache server are on the same machine, so there's no actual network overhead involved. If that's the case the only solution is to throw more hardware at it. I can do that, but before I go tossing more CPU at the problem I'd like to know I'm not just wasting money. The application uses the "C" language interface and just calls "Connectdb" - the only parameter is
Re: [PERFORM] Performance issue with 8.2.3 - "C" application
Karl Denninger <[EMAIL PROTECTED]> writes: > But here's the query that has a habit of taking the most time > select forum, * from post where toppost = 1 and (replied > (select > lastview from forumlog where login='theuser' and forum=post.forum and > number is null)) is not false AND (replied > (select lastview from > forumlog where login='theuser' and forum=post.forum and > number=post.number)) is not f > alse order by pinned desc, replied desc offset 0 limit 20 Did that ever perform well for you? It's the sub-selects that are likely to hurt ... in particular, > -> Index Scan using post_top on post (cost=0.00..57266.37 > rows=113 width=757) >Index Cond: (toppost = 1) >Filter: (((replied > (subplan)) IS NOT FALSE) AND > ((replied > (subplan)) IS NOT FALSE)) versus > Index Scan using post_top on post (cost=0.00..632.03 rows=1013 width=11) >Index Cond: (toppost = 1) The planner thinks that the two subplan filter conditions will eliminate about 90% of the rows returned by the bare indexscan (IIRC this is purely a rule of thumb, not based on any statistics) and that testing them 1013 times will add over 5 cost units to the basic indexscan. That part I believe --- correlated subqueries are expensive. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Performance issue with 8.2.3 - "C" application
Yeah, the problem doesn't appear to be there. As I said, if I look at the PS of the system when its bogging, there aren't a whole bunch of processes stuck doing these, so while this does take a second or two to come back, that's not that bad. Its GENERAL performance that just bites - the system is obviously out of CPU, but what I can't get a handle on is WHY. It does not appear to be accumulating large amounts of runtime in processes I can catch, but the load average is quite high. This is why I'm wondering if what I'm taking here is a hit on the fork/exec inside the portmaster, in the setup internally in there, in the IPC between my process via libPQ, etc - and how I can profile what's going on. Karl Denninger ([EMAIL PROTECTED]) http://www.denninger.net Tom Lane wrote: Karl Denninger <[EMAIL PROTECTED]> writes: But here's the query that has a habit of taking the most time select forum, * from post where toppost = 1 and (replied > (select lastview from forumlog where login='theuser' and forum=post.forum and number is null)) is not false AND (replied > (select lastview from forumlog where login='theuser' and forum=post.forum and number=post.number)) is not f alse order by pinned desc, replied desc offset 0 limit 20 Did that ever perform well for you? It's the sub-selects that are likely to hurt ... in particular, -> Index Scan using post_top on post (cost=0.00..57266.37 rows=113 width=757) Index Cond: (toppost = 1) Filter: (((replied > (subplan)) IS NOT FALSE) AND ((replied > (subplan)) IS NOT FALSE)) versus Index Scan using post_top on post (cost=0.00..632.03 rows=1013 width=11) Index Cond: (toppost = 1) The planner thinks that the two subplan filter conditions will eliminate about 90% of the rows returned by the bare indexscan (IIRC this is purely a rule of thumb, not based on any statistics) and that testing them 1013 times will add over 5 cost units to the basic indexscan. That part I believe --- correlated subqueries are expensive. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings %SPAMBLOCK-SYS: Matched [hub.org+], message ok
Re: [PERFORM] Performance issue with 8.2.3 - "C" application
On 7/25/07, Karl Denninger <[EMAIL PROTECTED]> wrote: Yeah, the problem doesn't appear to be there. As I said, if I look at the PS of the system when its bogging, there aren't a whole bunch of processes stuck doing these, so while this does take a second or two to come back, that's not that bad. Its GENERAL performance that just bites - the system is obviously out of CPU, but what I can't get a handle on is WHY. It does not appear to be accumulating large amounts of runtime in processes I can catch, but the load average is quite high. 8.2.3 has the 'stats collector bug' (fixed in 8.2.4) which increased load in high concurrency conditions. on a client's machine after patching the postmaster load drop from the 4-5 range to 1-2 range on a 500 tps server. maybe this is biting you? symptoms are high load avg and high cpu usage of stats collector process. merlin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance issue with 8.2.3 - "C" application
Hm. now that's interesting. Stats collector IS accumulating quite a bit of runtime. me thinks its time to go grab 8.2.4. Karl Denninger ([EMAIL PROTECTED]) http://www.denninger.net Merlin Moncure wrote: On 7/25/07, Karl Denninger <[EMAIL PROTECTED]> wrote: Yeah, the problem doesn't appear to be there. As I said, if I look at the PS of the system when its bogging, there aren't a whole bunch of processes stuck doing these, so while this does take a second or two to come back, that's not that bad. Its GENERAL performance that just bites - the system is obviously out of CPU, but what I can't get a handle on is WHY. It does not appear to be accumulating large amounts of runtime in processes I can catch, but the load average is quite high. 8.2.3 has the 'stats collector bug' (fixed in 8.2.4) which increased load in high concurrency conditions. on a client's machine after patching the postmaster load drop from the 4-5 range to 1-2 range on a 500 tps server. maybe this is biting you? symptoms are high load avg and high cpu usage of stats collector process. merlin %SPAMBLOCK-SYS: Matched [google.com+], message ok %SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance issue with 8.2.3 - "C" application
Karl Denninger <[EMAIL PROTECTED]> writes: > Hm. now that's interesting. Stats collector IS accumulating > quite a bit of runtime. me thinks its time to go grab 8.2.4. I think Merlin might have nailed it --- the "stats collector bug" is that it tries to write out the stats file way more often than it should. So any excessive userland CPU time you see is just the tip of the iceberg compared to the system and I/O costs incurred. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance issue with 8.2.3 - "C" application
Aha! BIG difference. I won't know for sure until the biz day tomorrow but the "first blush" look is that it makes a HUGE difference in system load, and I no longer have the stats collector process on the top of the "top" list.. Karl Denninger ([EMAIL PROTECTED]) http://www.denninger.net Tom Lane wrote: Karl Denninger <[EMAIL PROTECTED]> writes: Hm. now that's interesting. Stats collector IS accumulating quite a bit of runtime. me thinks its time to go grab 8.2.4. I think Merlin might have nailed it --- the "stats collector bug" is that it tries to write out the stats file way more often than it should. So any excessive userland CPU time you see is just the tip of the iceberg compared to the system and I/O costs incurred. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly %SPAMBLOCK-SYS: Matched [hub.org+], message ok %SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Performance issue with 8.2.3 - "C" application
Karl Denninger skrev: > I've got an interesting issue here that I'm running into with 8.2.3 > > This is an application that has run quite well for a long time, and has > been operating without significant changes (other than recompilation) > since back in the early 7.x Postgres days. But now we're seeing a LOT > more load than we used to with it, and suddenly, we're seeing odd > performance issues. > > It APPEARS that the problem isn't query performance per-se. That is, > while I can find a few processes here and there in a run state when I > look with a PS, I don't see them consistently churning. > > But here's the query that has a habit of taking the most time > > select forum, * from post where toppost = 1 and (replied > (select > lastview from forumlog where login='theuser' and forum=post.forum and > number is null)) is not false AND (replied > (select lastview from > forumlog where login='theuser' and forum=post.forum and > number=post.number)) is not false order by pinned desc, replied desc offset 0 > limit 20 Since I can do little to help you with anything else, here is a little help from a guy with a hammer. It seems you may be able to convert the subqueries into a left join. Not sure whether this helps, nor whether I got some bits of the logic wrong, but something like this might help the planner find a better plan: SELECT forum, * FROM post LEFT JOIN forumlog ON post.forum = forumlog.forum AND forumlog.login = 'theuser' AND (post.number = forumlog.number OR forumlog.number IS NULL) AND post.replied <= lastview WHERE forumlog.forum IS NULL AND forum.toppost = 1 ORDER BY pinned DESC, replied DESC OFFSET 0 LIMIT 20 ; Nis ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match