Re: [PERFORM] Need Some Suggestions

2005-10-07 Thread Richard Huxton

Lane Van Ingen wrote:

I have an application that is prone to sudden, unscheduled high bursts of
activity, and
I am finding that the application design permits me to detect the activity
bursts within
an existing function. The bursts only affect 3 tables, but degradation
becomes apparent
after 2,000 updates, and significant after 8,000 updates.


Hmm - assuming your free-space settings are large enough, it might be 
adequate to just run a vacuum on the 3 tables every 5 minutes or so. It 
sounds like these are quite small tables with a lot of activity, so if 
there's not much for vacuum to do it won't place too much load on your 
system.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] count(*) using index scan in "query often, update rarely" environment

2005-10-07 Thread Cestmir Hybl



Hello all
 
First of all, I do understand why pgsql with it's 
MVCC design has to examine tuples to evaluate "count(*)" and "count(*) 
where (...)" queries in environment with heavy concurrent updates.
 
This kind of usage IMHO isn't the average one. 
There are many circumstances with rather "query often, update 
rarely" character.
 
Isn't it possible (and reasonable) for these 
environments to keep track of whether there is a transaction in progress with 
update to given table and if not, use an index scan (count(*) where) or cached 
value (count(*)) to perform this kind of query?
 
(sorry for disturbing if this was already 
discussed)
 
Regards,
 
Cestmir Hybl


Re: [PERFORM] count(*) using index scan in "query often, update rarely" environment

2005-10-07 Thread hubert depesz lubaczewski
On 10/7/05, Cestmir Hybl <[EMAIL PROTECTED]> wrote:
Isn't it possible (and reasonable) for these 
environments to keep track of whether there is a transaction in progress with 
update to given table and if not, use an index scan (count(*) where) or cached 
value (count(*)) to perform this kind of query?

if i understand your problem correctly, then simple usage of triggers will do the job just fine.

hubert


Re: [PERFORM] count(*) using index scan in "query often, update rarely" environment

2005-10-07 Thread Cestmir Hybl



Yes, I can possibly use triggers to maintanin 
counts of several fixed groups of records or total recordcount (but it's 
unpractical).
 
No, I can't speed-up evaluation of generic 
"count(*) where ()" queries this way.
 
My question was rather about general performance of 
count() queries in environment with infrequent updates.
 
Cestmir

  - Original Message - 
  From: 
  hubert depesz 
  lubaczewski 
  To: Cestmir Hybl 
  Cc: pgsql-performance@postgresql.org 
  
  Sent: Friday, October 07, 2005 11:54 
  AM
  Subject: Re: [PERFORM] count(*) using 
  index scan in "query often, update rarely" environment
  On 10/7/05, Cestmir Hybl <[EMAIL PROTECTED]> wrote:
  
  
Isn't it possible (and reasonable) for these 
environments to keep track of whether there is a transaction in progress 
with update to given table and if not, use an index scan (count(*) where) or 
cached value (count(*)) to perform this kind of query?
if i understand your problem correctly, then 
  simple usage of triggers will do the job just 
fine.hubert


Re: [PERFORM] count(*) using index scan in "query often, update rarely" environment

2005-10-07 Thread Steinar H. Gunderson
On Fri, Oct 07, 2005 at 11:24:05AM +0200, Cestmir Hybl wrote:
> Isn't it possible (and reasonable) for these environments to keep track of
> whether there is a transaction in progress with update to given table and
> if not, use an index scan (count(*) where) or cached value (count(*)) to
> perform this kind of query?

Even if there is no running update, there might still be dead rows in the
table. In any case, of course, a new update could always be occurring while
your counting query was still running.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] count(*) using index scan in "query often, update rarely" environment

2005-10-07 Thread Cestmir Hybl
collision: it's possible to either block updating transaction until index 
scan ends or discard index scan imediately and finish query using MVCC 
compliant scan


dead rows: this sounds like more serious counter-argument, I don't know much 
about dead records management and whether it would be  possible/worth to 
make indexes matching live records when there's no transaction in progress 
on that table


- Original Message - 
From: "Steinar H. Gunderson" <[EMAIL PROTECTED]>

To: 
Sent: Friday, October 07, 2005 12:48 PM
Subject: Re: [PERFORM] count(*) using index scan in "query often, update 
rarely" environment




On Fri, Oct 07, 2005 at 11:24:05AM +0200, Cestmir Hybl wrote:
Isn't it possible (and reasonable) for these environments to keep track 
of

whether there is a transaction in progress with update to given table and
if not, use an index scan (count(*) where) or cached value (count(*)) to
perform this kind of query?


Even if there is no running update, there might still be dead rows in the
table. In any case, of course, a new update could always be occurring 
while

your counting query was still running.

/* Steinar */
--
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Status of Opteron vs Xeon

2005-10-07 Thread Merlin Moncure
> What's the current status of how much faster the Opteron is compared
to
> the
> Xeons?  I know the Opterons used to be close to 2x faster, but is that
> still
> the case?  I understand much work has been done to reduce the contect
> switching storms on the Xeon architecture, is this correct?

Up until two days ago (Oct 5) Intel has had no answer for AMD's dual
core offerings...unfortunately this has allowed AMD to charge top dollar
for dual core Opterons.  The Intel dual core solution on the P4 side
hasn't been very impressive particularly with regard to thermals.

My 90nm athlon 3000 at home runs very cool...if I underclock it a bit I
can actually turn off the cooling fan :).

IMO, right now it's AMD all the way, but if you are planning a big
purchase, it might be smart to wait a couple of months for the big price
realignment as Intel's dual xeons hit the retail channel.

Merlin


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] count(*) using index scan in "query often, update rarely" environment

2005-10-07 Thread Alvaro Herrera
On Fri, Oct 07, 2005 at 01:14:20PM +0200, Cestmir Hybl wrote:
> collision: it's possible to either block updating transaction until 
> index scan ends or discard index scan imediately and finish query using 
> MVCC compliant scan

You can't change from one scan method to a different one on the fly.
There's no way to know which tuples have alreaady been returned.

Our index access methods are designed to be very concurrent, and it
works extremely well.  One index scan being able to block an update
would destroy that advantage.

> dead rows: this sounds like more serious counter-argument, I don't know 
> much about dead records management and whether it would be  
> possible/worth to make indexes matching live records when there's no 
> transaction in progress on that table

It's not possible, because a finishing transaction would have to clean
up every index it has used, and also any index it hasn't used but has
been modified by another transaction which couldn't clean up by itself
but didn't do the work because the first one was looking at the index.
It's easy to see that it's possible to create an unbounded number of
transactions, each forcing the other to do some index cleanup.  This is
not acceptable.

Plus, it would be very hard to implement, and a very wide door to bugs.

-- 
Alvaro Herrerahttp://www.advogato.org/person/alvherre
"Et put se mouve" (Galileo Galilei)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Need Some Suggestions

2005-10-07 Thread Lane Van Ingen
You are correct, in that these tables are not large (50,000 records), but
their effect on performance is noticeable. Plain VACUUM (no freeze, full,
etc)
does the trick well, but I am unable to figure a way to call the 'plain
vanilla
version' of VACUUM via a PostgreSQL trigger function (does not allow it).

Using the Windows scheduler (schtask, somewhat like Unix cron) is an option,
but not a good one, as it takes too much out of the platform to run. My
client
does not use strong platforms, so I have to be concerned about that. VACUUM
is
a minimum impact on performance when running. I believe it would be much
better
to be able to call VACUUM out of a function, the same way in which other SQL
commands are used.

-Original Message-
From: Richard Huxton [mailto:[EMAIL PROTECTED]
Sent: Friday, October 07, 2005 3:53 AM
To: Lane Van Ingen
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Need Some Suggestions

Lane Van Ingen wrote:
> I have an application that is prone to sudden, unscheduled high bursts of
> activity, and I am finding that the application design permits me to
detect
> the activity bursts within an existing function. The bursts only affect 3
> tables, but degradation becomes apparent after 2,000 updates, and quite
> significant after 8,000 updates.

Hmm - assuming your free-space settings are large enough, it might be
adequate to just run a vacuum on the 3 tables every 5 minutes or so. It
sounds like these are quite small tables with a lot of activity, so if
there's not much for vacuum to do it won't place too much load on your
system.

--
   Richard Huxton
   Archonet Ltd



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] count(*) using index scan in "query often, update rarely" environment

2005-10-07 Thread Tom Lane
"Cestmir Hybl" <[EMAIL PROTECTED]> writes:
> Isn't it possible (and reasonable) for these environments to keep track =
> of whether there is a transaction in progress with update to given table =
> and if not, use an index scan (count(*) where) or cached value =
> (count(*)) to perform this kind of query?

Please read the archives before bringing up such well-discussed issues.

There's a workable-looking design in the archives (pghackers probably)
for maintaining overall table counts in a separate table, with each
transaction adding one row of "delta" information just before it
commits.  I haven't seen anything else that looks remotely attractive.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] count(*) using index scan in "query often, update rarely" environment

2005-10-07 Thread Merlin Moncure
On 10/7/05, Cestmir Hybl <[EMAIL PROTECTED]> wrote:
Isn't it possible (and reasonable) for these environments to keep track
of whether there is a transaction in progress with update to given table
and if not, use an index scan (count(*) where) or cached value
(count(*)) to perform this kind of query?


The answer to the first question is subtle.  Basically, the PostgreSQL
engine is designed for high concurrency.  We are definitely on the right
side of the cost/benefit tradeoff here.  SQL server does not have MVCC
(or at least until 2005 appears) so they are on the other side of the
tradeoff.

You can of course serialize the access yourself by materializing the
count in a small table and use triggers or cleverly designed
transactions.  This is trickier than it might look however so check the
archives for a thorough treatment of the topic.

One interesting thing is that making count(*) over large swaths of data
is frequently an indicator of a poorly normalized database.  Is it
possible to optimize the counting by laying out your data in a different
way?

Merlin



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] count(*) using index scan in "query often, update rarely"

2005-10-07 Thread Richard Huxton

Tom Lane wrote:


There's a workable-looking design in the archives (pghackers probably)
for maintaining overall table counts in a separate table, with each
transaction adding one row of "delta" information just before it
commits.  I haven't seen anything else that looks remotely attractive.


It might be useful if there was a way to trap certain queries and 
rewrite/replace them. That way more complex queries could be 
transparently redirected to a summary table etc. I'm guessing that the 
overhead to check every query would quickly destroy any gains though.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Status of Opteron vs Xeon

2005-10-07 Thread Chris Browne
[EMAIL PROTECTED] (Jeff Frost) writes:
> What's the current status of how much faster the Opteron is compared
> to the Xeons?  I know the Opterons used to be close to 2x faster,
> but is that still the case?  I understand much work has been done to
> reduce the contect switching storms on the Xeon architecture, is
> this correct?

Work has gone into 8.1 to try to help with the context switch storms;
that doesn't affect previous versions.

Furthermore, it does not do anything to address the consideration that
memory access on Opterons seem to be intrinsically faster than on Xeon
due to differences in the memory bus architecture.  

The only evident ways to address that are:
 a) For Intel to deploy chips with better memory buses;
 b) For Intel to convince people to deploy compilers that 
optimize badly on AMD to make Intel chips look better...
-- 
(format nil "[EMAIL PROTECTED]" "cbbrowne" "ntlug.org")
http://cbbrowne.com/info/lsf.html
A mathematician is a machine for converting caffeine into theorems.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Status of Opteron vs Xeon

2005-10-07 Thread Tom Lane
Chris Browne <[EMAIL PROTECTED]> writes:
> [EMAIL PROTECTED] (Jeff Frost) writes:
>> What's the current status of how much faster the Opteron is compared
>> to the Xeons?  I know the Opterons used to be close to 2x faster,
>> but is that still the case?  I understand much work has been done to
>> reduce the contect switching storms on the Xeon architecture, is
>> this correct?

> Work has gone into 8.1 to try to help with the context switch storms;
> that doesn't affect previous versions.

Also note that we've found that the current coding of the TAS macro
seems to be very bad for at least some Opterons --- they do much better
if the "pre-test" cmpb is removed.  But this is not true for all x86_64
chips.  We still have an open issue about what to do about this.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] wal_buffers

2005-10-07 Thread Josh Berkus
Ian, Thomas,

> Thanks; I'd seen the documentation, but not Josh Berkus'
> testing.

BTW, that's still an open question for me.  I'm now theorizing that it's 
best to set wal_buffers to the expected maximum number of concurrent write 
connections.   However, I don't have enough test systems to test that 
meaningfully.

Your test results will help.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Status of Opteron vs Xeon

2005-10-07 Thread Emil Briggs
>
> Furthermore, it does not do anything to address the consideration that
> memory access on Opterons seem to be intrinsically faster than on Xeon
> due to differences in the memory bus architecture.
>

I have been running some tests using different numa policies on a quad Opteron 
server and have found some significant performance differences depending on 
the type of load the system is under. It's not clear to me yet if I can draw 
any general conclusions from the results though.

Emil

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-07 Thread Luke Lonergan
Steinar,

On 10/5/05 5:12 PM, "Steinar H. Gunderson" <[EMAIL PROTECTED]> wrote:

> What? strlen is definitely not in the kernel, and thus won't count as system
> time.

System time on Linux includes time spent in glibc routines.

- Luke




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-07 Thread mark
On Fri, Oct 07, 2005 at 04:55:28PM -0700, Luke Lonergan wrote:
> On 10/5/05 5:12 PM, "Steinar H. Gunderson" <[EMAIL PROTECTED]> wrote:
> > What? strlen is definitely not in the kernel, and thus won't count as
> > system time.
> System time on Linux includes time spent in glibc routines.

Do you have a reference for this?

I believe this statement to be 100% false.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-07 Thread Luke Lonergan
Mark,

On 10/7/05 5:17 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:

> On Fri, Oct 07, 2005 at 04:55:28PM -0700, Luke Lonergan wrote:
>> On 10/5/05 5:12 PM, "Steinar H. Gunderson" <[EMAIL PROTECTED]> wrote:
>>> What? strlen is definitely not in the kernel, and thus won't count as
>>> system time.
>> System time on Linux includes time spent in glibc routines.
> 
> Do you have a reference for this?
> 
> I believe this statement to be 100% false.

How about 99%? OK, you're right, I had this confused with the profiling
problem where glibc routines aren't included in dynamic linked profiles.

Back to the statements earlier - the output of time had much of time for a
dd spent in system, which means kernel, so where in the kernel would that be
exactly?

- Luke



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings