Re: [PERFORM] Need Some Suggestions
Lane Van Ingen wrote: I have an application that is prone to sudden, unscheduled high bursts of activity, and I am finding that the application design permits me to detect the activity bursts within an existing function. The bursts only affect 3 tables, but degradation becomes apparent after 2,000 updates, and significant after 8,000 updates. Hmm - assuming your free-space settings are large enough, it might be adequate to just run a vacuum on the 3 tables every 5 minutes or so. It sounds like these are quite small tables with a lot of activity, so if there's not much for vacuum to do it won't place too much load on your system. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] count(*) using index scan in "query often, update rarely" environment
Hello all First of all, I do understand why pgsql with it's MVCC design has to examine tuples to evaluate "count(*)" and "count(*) where (...)" queries in environment with heavy concurrent updates. This kind of usage IMHO isn't the average one. There are many circumstances with rather "query often, update rarely" character. Isn't it possible (and reasonable) for these environments to keep track of whether there is a transaction in progress with update to given table and if not, use an index scan (count(*) where) or cached value (count(*)) to perform this kind of query? (sorry for disturbing if this was already discussed) Regards, Cestmir Hybl
Re: [PERFORM] count(*) using index scan in "query often, update rarely" environment
On 10/7/05, Cestmir Hybl <[EMAIL PROTECTED]> wrote: Isn't it possible (and reasonable) for these environments to keep track of whether there is a transaction in progress with update to given table and if not, use an index scan (count(*) where) or cached value (count(*)) to perform this kind of query? if i understand your problem correctly, then simple usage of triggers will do the job just fine. hubert
Re: [PERFORM] count(*) using index scan in "query often, update rarely" environment
Yes, I can possibly use triggers to maintanin counts of several fixed groups of records or total recordcount (but it's unpractical). No, I can't speed-up evaluation of generic "count(*) where ()" queries this way. My question was rather about general performance of count() queries in environment with infrequent updates. Cestmir - Original Message - From: hubert depesz lubaczewski To: Cestmir Hybl Cc: pgsql-performance@postgresql.org Sent: Friday, October 07, 2005 11:54 AM Subject: Re: [PERFORM] count(*) using index scan in "query often, update rarely" environment On 10/7/05, Cestmir Hybl <[EMAIL PROTECTED]> wrote: Isn't it possible (and reasonable) for these environments to keep track of whether there is a transaction in progress with update to given table and if not, use an index scan (count(*) where) or cached value (count(*)) to perform this kind of query? if i understand your problem correctly, then simple usage of triggers will do the job just fine.hubert
Re: [PERFORM] count(*) using index scan in "query often, update rarely" environment
On Fri, Oct 07, 2005 at 11:24:05AM +0200, Cestmir Hybl wrote: > Isn't it possible (and reasonable) for these environments to keep track of > whether there is a transaction in progress with update to given table and > if not, use an index scan (count(*) where) or cached value (count(*)) to > perform this kind of query? Even if there is no running update, there might still be dead rows in the table. In any case, of course, a new update could always be occurring while your counting query was still running. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] count(*) using index scan in "query often, update rarely" environment
collision: it's possible to either block updating transaction until index scan ends or discard index scan imediately and finish query using MVCC compliant scan dead rows: this sounds like more serious counter-argument, I don't know much about dead records management and whether it would be possible/worth to make indexes matching live records when there's no transaction in progress on that table - Original Message - From: "Steinar H. Gunderson" <[EMAIL PROTECTED]> To: Sent: Friday, October 07, 2005 12:48 PM Subject: Re: [PERFORM] count(*) using index scan in "query often, update rarely" environment On Fri, Oct 07, 2005 at 11:24:05AM +0200, Cestmir Hybl wrote: Isn't it possible (and reasonable) for these environments to keep track of whether there is a transaction in progress with update to given table and if not, use an index scan (count(*) where) or cached value (count(*)) to perform this kind of query? Even if there is no running update, there might still be dead rows in the table. In any case, of course, a new update could always be occurring while your counting query was still running. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Status of Opteron vs Xeon
> What's the current status of how much faster the Opteron is compared to > the > Xeons? I know the Opterons used to be close to 2x faster, but is that > still > the case? I understand much work has been done to reduce the contect > switching storms on the Xeon architecture, is this correct? Up until two days ago (Oct 5) Intel has had no answer for AMD's dual core offerings...unfortunately this has allowed AMD to charge top dollar for dual core Opterons. The Intel dual core solution on the P4 side hasn't been very impressive particularly with regard to thermals. My 90nm athlon 3000 at home runs very cool...if I underclock it a bit I can actually turn off the cooling fan :). IMO, right now it's AMD all the way, but if you are planning a big purchase, it might be smart to wait a couple of months for the big price realignment as Intel's dual xeons hit the retail channel. Merlin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] count(*) using index scan in "query often, update rarely" environment
On Fri, Oct 07, 2005 at 01:14:20PM +0200, Cestmir Hybl wrote: > collision: it's possible to either block updating transaction until > index scan ends or discard index scan imediately and finish query using > MVCC compliant scan You can't change from one scan method to a different one on the fly. There's no way to know which tuples have alreaady been returned. Our index access methods are designed to be very concurrent, and it works extremely well. One index scan being able to block an update would destroy that advantage. > dead rows: this sounds like more serious counter-argument, I don't know > much about dead records management and whether it would be > possible/worth to make indexes matching live records when there's no > transaction in progress on that table It's not possible, because a finishing transaction would have to clean up every index it has used, and also any index it hasn't used but has been modified by another transaction which couldn't clean up by itself but didn't do the work because the first one was looking at the index. It's easy to see that it's possible to create an unbounded number of transactions, each forcing the other to do some index cleanup. This is not acceptable. Plus, it would be very hard to implement, and a very wide door to bugs. -- Alvaro Herrerahttp://www.advogato.org/person/alvherre "Et put se mouve" (Galileo Galilei) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Need Some Suggestions
You are correct, in that these tables are not large (50,000 records), but their effect on performance is noticeable. Plain VACUUM (no freeze, full, etc) does the trick well, but I am unable to figure a way to call the 'plain vanilla version' of VACUUM via a PostgreSQL trigger function (does not allow it). Using the Windows scheduler (schtask, somewhat like Unix cron) is an option, but not a good one, as it takes too much out of the platform to run. My client does not use strong platforms, so I have to be concerned about that. VACUUM is a minimum impact on performance when running. I believe it would be much better to be able to call VACUUM out of a function, the same way in which other SQL commands are used. -Original Message- From: Richard Huxton [mailto:[EMAIL PROTECTED] Sent: Friday, October 07, 2005 3:53 AM To: Lane Van Ingen Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Need Some Suggestions Lane Van Ingen wrote: > I have an application that is prone to sudden, unscheduled high bursts of > activity, and I am finding that the application design permits me to detect > the activity bursts within an existing function. The bursts only affect 3 > tables, but degradation becomes apparent after 2,000 updates, and quite > significant after 8,000 updates. Hmm - assuming your free-space settings are large enough, it might be adequate to just run a vacuum on the 3 tables every 5 minutes or so. It sounds like these are quite small tables with a lot of activity, so if there's not much for vacuum to do it won't place too much load on your system. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] count(*) using index scan in "query often, update rarely" environment
"Cestmir Hybl" <[EMAIL PROTECTED]> writes: > Isn't it possible (and reasonable) for these environments to keep track = > of whether there is a transaction in progress with update to given table = > and if not, use an index scan (count(*) where) or cached value = > (count(*)) to perform this kind of query? Please read the archives before bringing up such well-discussed issues. There's a workable-looking design in the archives (pghackers probably) for maintaining overall table counts in a separate table, with each transaction adding one row of "delta" information just before it commits. I haven't seen anything else that looks remotely attractive. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] count(*) using index scan in "query often, update rarely" environment
On 10/7/05, Cestmir Hybl <[EMAIL PROTECTED]> wrote: Isn't it possible (and reasonable) for these environments to keep track of whether there is a transaction in progress with update to given table and if not, use an index scan (count(*) where) or cached value (count(*)) to perform this kind of query? The answer to the first question is subtle. Basically, the PostgreSQL engine is designed for high concurrency. We are definitely on the right side of the cost/benefit tradeoff here. SQL server does not have MVCC (or at least until 2005 appears) so they are on the other side of the tradeoff. You can of course serialize the access yourself by materializing the count in a small table and use triggers or cleverly designed transactions. This is trickier than it might look however so check the archives for a thorough treatment of the topic. One interesting thing is that making count(*) over large swaths of data is frequently an indicator of a poorly normalized database. Is it possible to optimize the counting by laying out your data in a different way? Merlin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] count(*) using index scan in "query often, update rarely"
Tom Lane wrote: There's a workable-looking design in the archives (pghackers probably) for maintaining overall table counts in a separate table, with each transaction adding one row of "delta" information just before it commits. I haven't seen anything else that looks remotely attractive. It might be useful if there was a way to trap certain queries and rewrite/replace them. That way more complex queries could be transparently redirected to a summary table etc. I'm guessing that the overhead to check every query would quickly destroy any gains though. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Status of Opteron vs Xeon
[EMAIL PROTECTED] (Jeff Frost) writes: > What's the current status of how much faster the Opteron is compared > to the Xeons? I know the Opterons used to be close to 2x faster, > but is that still the case? I understand much work has been done to > reduce the contect switching storms on the Xeon architecture, is > this correct? Work has gone into 8.1 to try to help with the context switch storms; that doesn't affect previous versions. Furthermore, it does not do anything to address the consideration that memory access on Opterons seem to be intrinsically faster than on Xeon due to differences in the memory bus architecture. The only evident ways to address that are: a) For Intel to deploy chips with better memory buses; b) For Intel to convince people to deploy compilers that optimize badly on AMD to make Intel chips look better... -- (format nil "[EMAIL PROTECTED]" "cbbrowne" "ntlug.org") http://cbbrowne.com/info/lsf.html A mathematician is a machine for converting caffeine into theorems. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Status of Opteron vs Xeon
Chris Browne <[EMAIL PROTECTED]> writes: > [EMAIL PROTECTED] (Jeff Frost) writes: >> What's the current status of how much faster the Opteron is compared >> to the Xeons? I know the Opterons used to be close to 2x faster, >> but is that still the case? I understand much work has been done to >> reduce the contect switching storms on the Xeon architecture, is >> this correct? > Work has gone into 8.1 to try to help with the context switch storms; > that doesn't affect previous versions. Also note that we've found that the current coding of the TAS macro seems to be very bad for at least some Opterons --- they do much better if the "pre-test" cmpb is removed. But this is not true for all x86_64 chips. We still have an open issue about what to do about this. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] wal_buffers
Ian, Thomas, > Thanks; I'd seen the documentation, but not Josh Berkus' > testing. BTW, that's still an open question for me. I'm now theorizing that it's best to set wal_buffers to the expected maximum number of concurrent write connections. However, I don't have enough test systems to test that meaningfully. Your test results will help. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Status of Opteron vs Xeon
> > Furthermore, it does not do anything to address the consideration that > memory access on Opterons seem to be intrinsically faster than on Xeon > due to differences in the memory bus architecture. > I have been running some tests using different numa policies on a quad Opteron server and have found some significant performance differences depending on the type of load the system is under. It's not clear to me yet if I can draw any general conclusions from the results though. Emil ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PERFORM] A Better External Sort?
Steinar, On 10/5/05 5:12 PM, "Steinar H. Gunderson" <[EMAIL PROTECTED]> wrote: > What? strlen is definitely not in the kernel, and thus won't count as system > time. System time on Linux includes time spent in glibc routines. - Luke ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PERFORM] A Better External Sort?
On Fri, Oct 07, 2005 at 04:55:28PM -0700, Luke Lonergan wrote: > On 10/5/05 5:12 PM, "Steinar H. Gunderson" <[EMAIL PROTECTED]> wrote: > > What? strlen is definitely not in the kernel, and thus won't count as > > system time. > System time on Linux includes time spent in glibc routines. Do you have a reference for this? I believe this statement to be 100% false. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PERFORM] A Better External Sort?
Mark, On 10/7/05 5:17 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > On Fri, Oct 07, 2005 at 04:55:28PM -0700, Luke Lonergan wrote: >> On 10/5/05 5:12 PM, "Steinar H. Gunderson" <[EMAIL PROTECTED]> wrote: >>> What? strlen is definitely not in the kernel, and thus won't count as >>> system time. >> System time on Linux includes time spent in glibc routines. > > Do you have a reference for this? > > I believe this statement to be 100% false. How about 99%? OK, you're right, I had this confused with the profiling problem where glibc routines aren't included in dynamic linked profiles. Back to the statements earlier - the output of time had much of time for a dd spent in system, which means kernel, so where in the kernel would that be exactly? - Luke ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings