Re: [PERFORM] Restricting Postgres

2004-11-03 Thread Andrew Sullivan
On Tue, Nov 02, 2004 at 11:52:12PM +, Martin Foster wrote:
> Is there a way to restrict how much load a PostgreSQL server can take 
> before dropping queries in order to safeguard the server?I was 

Well, you could limit the number of concurrent connections, and set
the query timeout to a relatively low level.  What that ought to mean
is that, under heavy load, some queries will abort.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
When my information changes, I alter my conclusions.  What do you do sir?
--attr. John Maynard Keynes

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] preloading indexes

2004-11-03 Thread stuff








I am working with some pretty convoluted queries that work
very slowly the first time they’re called but perform fine on the second
call. I am fairly certain that these differences are due to the caching. Can
someone point me in a direction that would allow me to pre-cache the critical
indexes?








Re: [PERFORM] preloading indexes

2004-11-03 Thread Matt Clark
Title: Message




The best way to get all the stuff needed by a query into 
RAM is to run the query.  Is it more that you want to 'pin' the data in RAM 
so it doesn't get overwritten by other 
queries?
 
-Original Message-From: 
[EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of 
[EMAIL PROTECTED]Sent: 03 November 2004 
17:31To: [EMAIL PROTECTED]Subject: 
[PERFORM] preloading indexes

  
  I am working with some pretty 
  convoluted queries that work very slowly the first time they’re called but 
  perform fine on the second call. I am fairly certain that these differences 
  are due to the caching. Can someone point me in a direction that would allow 
  me to pre-cache the critical 
indexes?


[PERFORM] vacuum analyze slows sql query

2004-11-03 Thread patrick ~
Greetings pgsql-performance :)

Yesterday I posted to the pgsql-sql list about an issue with VACUUM
while trying to track-down an issue with performance of a SQL SELECT
statement invovling a stored function.  It was suggested that I bring
the discussion over to -performance.

Instread of reposting the message here is a link to the original
message followed by a brief summary:

  http://marc.theaimsgroup.com/?l=postgresql-sql&m=109945118928530&w=2


Summary:

Our customer complains about web/php-based UI sluggishness accessing
the data in db.  I created a "stripped down" version of the tables
in question to be able to post to the pgsql-sql list asking for hints
as to how I can improve the SQL query.  While doing this I noticed
that if I 'createdb' and populate it with the "sanatized" data the
query in question is quite fast; 618 rows returned in 864.522 ms.
This was puzzling.  Next I noticed that after a VACUUM the very same
query would slow down to a crawl; 618 rows returned in 1080688.921 ms).

This was reproduced on PostgreSQL 7.4.2 running on a Intel PIII 700Mhz,
512mb.  This system is my /personal/ test system/sandbox. i.e., it
isn't being stressed by any other processes.


Thanks for reading,
--patrick



__ 
Do you Yahoo!? 
Check out the new Yahoo! Front Page. 
www.yahoo.com 
 


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] preloading indexes

2004-11-03 Thread stuff
Title: Message








That’s correct – I’d
like to be able to keep particular indexes in RAM available all the time

 



The best way to get all the stuff needed
by a query into RAM is to run the query.  Is it more that you want to
'pin' the data in RAM so it doesn't get overwritten by other queries?

 





I am working with some pretty convoluted queries that work
very slowly the first time they’re called but perform fine on the second
call. I am fairly certain that these differences are due to the caching. Can
someone point me in a direction that would allow me to pre-cache the critical
indexes?










Re: [PERFORM] vacuum analyze slows sql query

2004-11-03 Thread Doug Y
Given that the plan doesn't change after an analyze, my guess would be that the first query is hitting cached data, then 
 you vacuum and that chews though all the cache with its own data pushing the good data out of the cache so it has to 
be re-fetched from disk.

If you run the select a 2nd time after the vacuum, what is the time?
Not sure what your pkk_offer_has_pending_purch function does, that might be something 
to look at as well.
I could be wrong, but thats the only thing that makes sense to me. ARC is supposed to 
help with that type of behavior in 8.0
patrick ~ wrote:
Greetings pgsql-performance :)
Yesterday I posted to the pgsql-sql list about an issue with VACUUM
while trying to track-down an issue with performance of a SQL SELECT
statement invovling a stored function.  It was suggested that I bring
the discussion over to -performance.
Instread of reposting the message here is a link to the original
message followed by a brief summary:
  http://marc.theaimsgroup.com/?l=postgresql-sql&m=109945118928530&w=2
Summary:
Our customer complains about web/php-based UI sluggishness accessing
the data in db.  I created a "stripped down" version of the tables
in question to be able to post to the pgsql-sql list asking for hints
as to how I can improve the SQL query.  While doing this I noticed
that if I 'createdb' and populate it with the "sanatized" data the
query in question is quite fast; 618 rows returned in 864.522 ms.
This was puzzling.  Next I noticed that after a VACUUM the very same
query would slow down to a crawl; 618 rows returned in 1080688.921 ms).
This was reproduced on PostgreSQL 7.4.2 running on a Intel PIII 700Mhz,
512mb.  This system is my /personal/ test system/sandbox. i.e., it
isn't being stressed by any other processes.
Thanks for reading,
--patrick

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Restricting Postgres

2004-11-03 Thread Simon Riggs
On Tue, 2004-11-02 at 23:52, Martin Foster wrote:
> Is there a way to restrict how much load a PostgreSQL server can take 
> before dropping queries in order to safeguard the server?I was 
> looking at the login.conf (5) man page and while it allows me to limit 
> by processor time this seems to not fit my specific needs.
> 
> Essentially, I am looking for a sort of functionality similar to what 
> Sendmail and Apache have.   Once the load of the system reaches a 
> certain defined limit the daemon drops tasks until such a time that it 
> can resume normal operation.

Sounds great... could you give more shape to the idea, so people can
comment on it?

What limit? Measured how? Normal operation is what?

Drop what? How to tell?

> 
> While not necessarily common on my servers I have witnessed some fairly 
> high load averages which may have led to the machine dropping outright. 
>Any help on this matter would be appreciated.

You can limit the number of connections overall?

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] preloading indexes

2004-11-03 Thread Andrew Sullivan
On Wed, Nov 03, 2004 at 12:12:43PM -0700, [EMAIL PROTECTED] wrote:
> That's correct - I'd like to be able to keep particular indexes in RAM
> available all the time

If these are queries that run frequently, then the relevant cache
will probably remain populated[1].  If they _don't_ run frequently, why
do you want to force the memory to be used to optimise something that
is uncommon?  But in any case, there's no mechanism to do this.

A

[1] there are in fact limits on the caching: if your data set is
larger than memory, for instance, there's no way it will all stay
cached.  Also, VACUUM does nasty things to the cache.  It is hoped
that nastiness is fixed in 8.0.

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
The plural of anecdote is not data.
--Roger Brinner

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] preloading indexes

2004-11-03 Thread Pierre-Frédéric Caillaud
--
	uh, you can always load a table in cache by doing a seq scan on it...  
like select count(1) from table or something... this doesn't work for  
indexes of course, but you can always look in the system catalogs, find  
the filename for the index, then just open() it from an external program  
and read it without caring for the data... it'll save you the seeks in the  
index... of course you'll have problems with file permissions etc, not  
mentioning security, locking, etc, etc, etc, is that worth the trouble ?

On Wed, 3 Nov 2004 14:35:28 -0500, Andrew Sullivan <[EMAIL PROTECTED]>  
wrote:

On Wed, Nov 03, 2004 at 12:12:43PM -0700, [EMAIL PROTECTED]  
wrote:
That's correct - I'd like to be able to keep particular indexes in RAM
available all the time
If these are queries that run frequently, then the relevant cache
will probably remain populated[1].  If they _don't_ run frequently, why
do you want to force the memory to be used to optimise something that
is uncommon?  But in any case, there's no mechanism to do this.
A
[1] there are in fact limits on the caching: if your data set is
larger than memory, for instance, there's no way it will all stay
cached.  Also, VACUUM does nasty things to the cache.  It is hoped
that nastiness is fixed in 8.0.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] preloading indexes

2004-11-03 Thread Tom Lane
<[EMAIL PROTECTED]> writes:
> I am working with some pretty convoluted queries that work very slowly the
> first time they're called but perform fine on the second call. I am fairly
> certain that these differences are due to the caching. Can someone point me
> in a direction that would allow me to pre-cache the critical indexes?

Buy more RAM.  Also check your shared_buffers setting (but realize that
more is not necessarily better).

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] vacuum analyze slows sql query

2004-11-03 Thread Tom Lane
patrick ~ <[EMAIL PROTECTED]> writes:
> that if I 'createdb' and populate it with the "sanatized" data the
> query in question is quite fast; 618 rows returned in 864.522 ms.
> This was puzzling.  Next I noticed that after a VACUUM the very same
> query would slow down to a crawl; 618 rows returned in 1080688.921 ms).

The outer query is too simple to have more than one possible plan,
so the issue is certainly a change in query plans inside the function.
You need to be investigating what's happening inside that function.
7.1 doesn't have adequate tools for this, but in 7.4 you can use
PREPARE and EXPLAIN ANALYZE EXECUTE to examine the query plans used
for parameterized statements, which is what you've got here.

My bet is that with ANALYZE stats present, the planner guesses wrong
about which index to use; but without looking at EXPLAIN ANALYZE output
there's no way to be sure.

BTW, why the bizarrely complicated substitute for a NOT NULL test?
ISTM you only need

create function
pkk_offer_has_pending_purch( integer )
returns bool
as  '
select  p0.purchase_id is not null
  from  pkk_purchase p0
 where  p0.offer_id = $1
and ( p0.pending = true
or ( ( p0.expire_time > now()
 or p0.expire_time isnull )
   and p0.cancel_date isnull ) )
  limit 1
' language 'sql' ;

(Actually, seeing that pkk_purchase.purchase_id is defined as NOT NULL,
I wonder why the function exists at all ... but I suppose you've
"stripped" the function to the point of being nonsense.)

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] preloading indexes

2004-11-03 Thread stuff
The caching appears to disappear overnight. The environment is not in
production yet so I'm the only one on it. 

Is there a time limit on the length of time in cache? I believe there is
sufficient RAM, but maybe I need to look again.

s 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Andrew Sullivan
Sent: Wednesday, November 03, 2004 12:35 PM
To: [EMAIL PROTECTED]
Subject: Re: [PERFORM] preloading indexes

On Wed, Nov 03, 2004 at 12:12:43PM -0700, [EMAIL PROTECTED] wrote:
> That's correct - I'd like to be able to keep particular indexes in RAM
> available all the time

If these are queries that run frequently, then the relevant cache
will probably remain populated[1].  If they _don't_ run frequently, why
do you want to force the memory to be used to optimise something that
is uncommon?  But in any case, there's no mechanism to do this.

A

[1] there are in fact limits on the caching: if your data set is
larger than memory, for instance, there's no way it will all stay
cached.  Also, VACUUM does nasty things to the cache.  It is hoped
that nastiness is fixed in 8.0.

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
The plural of anecdote is not data.
--Roger Brinner

---(end of broadcast)---
TIP 8: explain analyze is your friend



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] preloading indexes

2004-11-03 Thread Tom Lane
<[EMAIL PROTECTED]> writes:
> The caching appears to disappear overnight.

You've probably got cron jobs that run late at night and blow out your
kernel disk cache by accessing a whole lot of non-Postgres stuff.
(A nightly disk backup is one obvious candidate.)  The most likely
solution is to run some cron job a little later to exercise your
database and thereby repopulate the cache with Postgres files before
you get to work ;-)

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] preloading indexes

2004-11-03 Thread Andrew Sullivan
On Wed, Nov 03, 2004 at 01:19:43PM -0700, [EMAIL PROTECTED] wrote:
> The caching appears to disappear overnight. The environment is not in
> production yet so I'm the only one on it. 

Are you vacuuming at night?  It grovels through the entire database,
and may bust your query out of the cache.  Also, we'd need some more
info about how you've tuned this thing.  Maybe check out the archives
first for some tuning pointers to help you.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
In the future this spectacle of the middle classes shocking the avant-
garde will probably become the textbook definition of Postmodernism. 
--Brad Holland

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] preloading indexes

2004-11-03 Thread stuff

Thanks - this is what I was afraid of, but I may have to do this

Is there a good way to monitor what's in the cache?

j

<[EMAIL PROTECTED]> writes:
> The caching appears to disappear overnight.

You've probably got cron jobs that run late at night and blow out your
kernel disk cache by accessing a whole lot of non-Postgres stuff.
(A nightly disk backup is one obvious candidate.)  The most likely
solution is to run some cron job a little later to exercise your
database and thereby repopulate the cache with Postgres files before
you get to work ;-)

regards, tom lane



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Restricting Postgres

2004-11-03 Thread Martin Foster
Simon Riggs wrote:
On Tue, 2004-11-02 at 23:52, Martin Foster wrote:
Is there a way to restrict how much load a PostgreSQL server can take 
before dropping queries in order to safeguard the server?I was 
looking at the login.conf (5) man page and while it allows me to limit 
by processor time this seems to not fit my specific needs.

Essentially, I am looking for a sort of functionality similar to what 
Sendmail and Apache have.   Once the load of the system reaches a 
certain defined limit the daemon drops tasks until such a time that it 
can resume normal operation.

Sounds great... could you give more shape to the idea, so people can
comment on it?
What limit? Measured how? Normal operation is what?
Drop what? How to tell?

Let's use the example in Apache, there is the Apache::LoadAvgLimit 
mod_perl module which allows one to limit based on the system load 
averages.   Here is an example of the configuration one would find:

  
PerlInitHandler Apache::LoadAvgLimit
PerlSetVar LoadAvgLimit_1 3.00
PerlSetVar LoadAvgLimit_5 2.00
PerlSetVar LoadAvgLimit_15 1.50
PerlSetVar LoadAvgRetryAfter 120
  
The end state is simple, once the load average moves above 3.00 for the 
1 minute average the web server will not process the CGI scripts or 
mod_perl applications under that directory.  Instead it will return a 
503 error and save the system from being crushed by ever increasing load 
averages.

Only once the load average is below the defined limits will the server 
process requests as normal.   This is not necessarily the nicest or 
cleanest way or doing things, but it does allow the Apache web server to 
prevent a collapse.

There are ways of restricting the size of files, number of concurrent 
processes and even memory being used by a daemon.  This can be done 
through ulimit or the login.conf file if your system supports it. 
However, there is no way to restrict based on load averages, only 
processor time which is ineffective for a perpetually running daemon 
like PostgreSQL has.

While not necessarily common on my servers I have witnessed some fairly 
high load averages which may have led to the machine dropping outright. 
  Any help on this matter would be appreciated.

You can limit the number of connections overall?
Limiting concurrent connections is not always the solution to the 
problem.   Problems can occur when there is a major spike in activity 
that would be considered abnormal, due to outside conditions.

For example using Apache::DBI or pgpool the DBMS may be required to 
spawn a great deal of child processed in a short order of time.   This 
in turn can cause a major spike in processor load and if unchecked by 
running as high demand queries the system can literally increase in load 
until the server buckles.

I've seen this behavior before when restarting the web server during 
heavy loads.Apache goes from zero connections to a solid 120, 
causing PostgreSQL to spawn that many children in a short order of time 
just to keep up with the demand.

PostgreSQL undertakes a penalty when spawning a new client and accepting 
a connection, this slows takes resources at every level to accomplish. 
 However clients on the web server are hitting the server at an 
accelerated rate because of the slowed response, leading to even more 
demand being placed on both machines.

In most cases the processor will be taxed and the load average high 
enough to cause even a noticeable delay when using a console, however it 
will generally recover... slowly or in rare cases crash outright.   In 
such a circumstance, having the database server refuse queries when the 
sanity of the system is concerned might come in handy for such a 
circumstance.

Of course, I am not blaming PostgreSQL, there are probably some 
instabilities in the AMD64 port of FreeBSD 5.2.1 for dual processor 
systems that lead to an increased chance of failure instead of recovery. 
  However, if there was a way to prevent the process from reaching 
those limits, it may avoid the problem altogether.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] vacuum analyze slows sql query

2004-11-03 Thread patrick ~
Here is a fresh run with 'explain analyze' run before and after the
VACUUM statement:

-- begin
% dropdb pkk
DROP DATABASE
% createdb pkk
CREATE DATABASE
% psql pkk < pkk_db.sql
ERROR:  function pkk_offer_has_pending_purch(integer) does not exist
ERROR:  function pkk_offer_has_pending_purch2(integer) does not exist
ERROR:  table "pkk_billing" does not exist
ERROR:  table "pkk_purchase" does not exist
ERROR:  table "pkk_offer" does not exist
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "pkk_offer_pkey"
for table "pkk_offer"
CREATE TABLE
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"pkk_purchase_pkey" for table "pkk_purchase"
CREATE TABLE
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE TABLE
CREATE INDEX
CREATE FUNCTION
CREATE FUNCTION
% zcat pkk.20041028_00.sql.gz | psql pkk 
SET
SET
SET
SET
% psql pkk
pkk=# select offer_id, pkk_offer_has_pending_purch( offer_id ) from pkk_offer ;

(618 rows)

Time: 877.348 ms
pkk=# explain analyze select offer_id, pkk_offer_has_pending_purch( offer_id )
from pkk_offer ;
QUERY PLAN
---
 Seq Scan on pkk_offer  (cost=0.00..22.50 rows=1000 width=4) (actual
time=1.291..845.485 rows=618 loops=1)
 Total runtime: 849.475 ms
(2 rows)

Time: 866.613 ms
pkk=# vacuum analyze ;
VACUUM
Time: 99344.399 ms
pkk=# explain analyze select offer_id, pkk_offer_has_pending_purch( offer_id )
from pkk_offer ;
   QUERY PLAN
-
 Seq Scan on pkk_offer  (cost=0.00..13.72 rows=618 width=4) (actual
time=3636.401..1047412.851 rows=618 loops=1)
 Total runtime: 1047415.525 ms
(2 rows)

Time: 1047489.477 ms
-- end



Tom,

The reason of the extra "case" part in the function is to ensure non-null
fields on the result.  I tried your version as well and i get similar
performance results:

-- begin
pkk=# create function toms_pending_purch( integer ) returns bool as 'select 
p0.purchase_id is not null from  pkk_purchase p0 where  p0.offer_id = $1 and (
p0.pending = true or ( ( p0.expire_time > now() or p0.expire_time isnull ) and
p0.cancel_date isnull ) ) limit 1 ' language 'sql' ;
CREATE FUNCTION
Time: 2.496 ms
pkk=# select offer_id, toms_pending_purch( offer_id ) from pkk_offer ;
(618 rows)

Time: 1052339.506 ms
-- end


Right now, I'm studying the document section on PREPARE and will
attempt to play around with it.


I was asked (in a prior post) whether running the statement a second
time after the VACUUM improves in performance.  It does not.  After
the VACUUM the statement remains slow.


Thanks for your help,
--patrick



--- Tom Lane <[EMAIL PROTECTED]> wrote:

> patrick ~ <[EMAIL PROTECTED]> writes:
> > that if I 'createdb' and populate it with the "sanatized" data the
> > query in question is quite fast; 618 rows returned in 864.522 ms.
> > This was puzzling.  Next I noticed that after a VACUUM the very same
> > query would slow down to a crawl; 618 rows returned in 1080688.921 ms).
> 
> The outer query is too simple to have more than one possible plan,
> so the issue is certainly a change in query plans inside the function.
> You need to be investigating what's happening inside that function.
> 7.1 doesn't have adequate tools for this, but in 7.4 you can use
> PREPARE and EXPLAIN ANALYZE EXECUTE to examine the query plans used
> for parameterized statements, which is what you've got here.
> 
> My bet is that with ANALYZE stats present, the planner guesses wrong
> about which index to use; but without looking at EXPLAIN ANALYZE output
> there's no way to be sure.
> 
> BTW, why the bizarrely complicated substitute for a NOT NULL test?
> ISTM you only need
> 
> create function
> pkk_offer_has_pending_purch( integer )
> returns bool
> as  '
> select  p0.purchase_id is not null
>   from  pkk_purchase p0
>  where  p0.offer_id = $1
> and ( p0.pending = true
> or ( ( p0.expire_time > now()
>  or p0.expire_time isnull )
>and p0.cancel_date isnull ) )
>   limit 1
> ' language 'sql' ;
> 
> (Actually, seeing that pkk_purchase.purchase_id is defined as NOT NULL,
> I wonder why the function exists at all ... but I suppose you've
> "stripped" the function to the point of being nonsense.)
> 
>   regards, tom lane



__ 
Do you Yahoo!? 
Check out the new Yahoo! Front Page. 
www.yahoo.com 
 


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Restricting Postgres

2004-11-03 Thread John A Meinel
Martin Foster wrote:
Simon Riggs wrote:
On Tue, 2004-11-02 at 23:52, Martin Foster wrote:
[...]
I've seen this behavior before when restarting the web server during 
heavy loads.Apache goes from zero connections to a solid 120, 
causing PostgreSQL to spawn that many children in a short order of time 
just to keep up with the demand.

But wouldn't limiting the number of concurrent connections do this at 
the source. If you tell it that "You can at most have 20 connections" 
you would never have postgres spawn 120 children.
I'm not sure what apache does if it can't get a DB connection, but it 
seems exactly like what you want.

Now, if you expected to have 50 clients that all like to just sit on 
open connections, you could leave the number of concurrent connections high.

But if your only connect is from the webserver, where all of them are 
designed to be short connections, then leave the max low.

The other possibility is having the webserver use connection pooling, so 
it uses a few long lived connections. But even then, you could limit it 
to something like 10-20, not 120.

John
=:->


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Restricting Postgres

2004-11-03 Thread Martin Foster
John A Meinel wrote:
Martin Foster wrote:
Simon Riggs wrote:
On Tue, 2004-11-02 at 23:52, Martin Foster wrote:
[...]
I've seen this behavior before when restarting the web server during 
heavy loads.Apache goes from zero connections to a solid 120, 
causing PostgreSQL to spawn that many children in a short order of 
time just to keep up with the demand.

But wouldn't limiting the number of concurrent connections do this at 
the source. If you tell it that "You can at most have 20 connections" 
you would never have postgres spawn 120 children.
I'm not sure what apache does if it can't get a DB connection, but it 
seems exactly like what you want.

Now, if you expected to have 50 clients that all like to just sit on 
open connections, you could leave the number of concurrent connections 
high.

But if your only connect is from the webserver, where all of them are 
designed to be short connections, then leave the max low.

The other possibility is having the webserver use connection pooling, so 
it uses a few long lived connections. But even then, you could limit it 
to something like 10-20, not 120.

John
=:->
I have a dual processor system that can support over 150 concurrent 
connections handling normal traffic and load.   Now suppose I setup 
Apache to spawn all of it's children instantly, what will happen is that 
as this happens the PostgreSQL server will also receive 150 attempts at 
connection.

This will spawn 150 children in a short order of time and as this takes 
place clients can connect and start requesting information not allowing 
the machine to settle down to a normal traffic.That spike when 
initiated can cripple the machine or even the webserver if a deadlocked 
transaction is introduced.

Because on the webserver side a slowdown in the database means that it 
will just get that many more connection attempts pooled from the 
clients.  As they keep clicking and hitting reload over and over to get 
a page load, that server starts to buckle hitting unbelievably high load 
averages.

When the above happened once, I lost the ability to type on a console 
because of a 60+ (OpenBSD) load average on a single processor system. 
The reason why Apache now drops a 503 Service Unavailable when loads get 
too high.

It's that spike I worry about and it can happen for whatever reason.  It 
could just as easily be triggered by a massive concurrent request for 
processing of an expensive query done in DDOS fashion.   This may not 
affect the webserver at all, at least immediately, but the same problem 
can effect can come into effect.

Limiting connections help, but it's not the silver bullet and limits 
your ability to support more connections because of that initial spike. 
   The penalty for forking a new child is hardly unexecpected, even 
Apache will show the same effect when restarted in a high traffic time.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] preloading indexes

2004-11-03 Thread Mike Benoit
If your running Linux, and kernel 2.6.x, you can try playing with the:

/proc/sys/vm/swappiness

setting.

My understanding is that:

echo "0" > /proc/sys/vm/swappiness

Will try to keep all in-use application memory from being swapped out
when other processes query the disk a lot.

Although, since PostgreSQL utilizes the disk cache quite a bit, this may
not help you. 


On Wed, 2004-11-03 at 15:53 -0500, Tom Lane wrote:
> <[EMAIL PROTECTED]> writes:
> > The caching appears to disappear overnight.
> 
> You've probably got cron jobs that run late at night and blow out your
> kernel disk cache by accessing a whole lot of non-Postgres stuff.
> (A nightly disk backup is one obvious candidate.)  The most likely
> solution is to run some cron job a little later to exercise your
> database and thereby repopulate the cache with Postgres files before
> you get to work ;-)
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 9: the planner will ignore your desire to choose an index scan if your
>   joining column's datatypes do not match
-- 
Mike Benoit <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part


[PERFORM] index not used if using IN or OR

2004-11-03 Thread Mario Ivankovits
Hello !
Sorry if this has been discussed before, it is just hard to find in the 
archives using the words "or" or "in" :-o

I use postgres-8.0 beta4 for windows.
I broke down my problem to a very simple table - two columns 
"primary_key" and "secondary_key". Creates and Insert you will find below.

If I query the _empty_ freshly created table I get the following explain 
result:

select * from tt where seckey = 1;
Index Scan using seckey_key on tt  (cost=0.00..17.07 rows=5 width=12)
 Index Cond: (seckey = 1)
If I use "OR" (or IN) things get worse:
select * from tt where seckey = 1 or seckey = 2
Seq Scan on tt  (cost=0.00..0.00 rows=1 width=12)
 Filter: ((seckey = 1) OR (seckey = 2))
Note the "Seq Scan" instead of using the index.
After populating the table with 8920 records and "analyze" the scenario 
gets even worser:

select * from tt where seckey = 1;
Seq Scan on tt  (cost=0.00..168.50 rows=1669 width=12) (actual 
time=0.000..15.000 rows=1784 loops=1)
 Filter: (seckey = 1)
Total runtime: 31.000 ms

Now also this simple query uses a "Seq Scan".
Now the questions are:
a) Why is the index not used if I use "OR" or "IN"
b) Why is the index not used after "analyze" ?
Any help is very appreciated!
Thanks,
Mario
// The table and data
CREATE TABLE tt (
 pkey int4 NOT NULL DEFAULT nextval('public."tt_PKEY_seq"'::text),
 seckey int8,
 CONSTRAINT pkey_key PRIMARY KEY (pkey)
)
WITHOUT OIDS;
CREATE INDEX seckey_key  ON tt  USING btree  (seckey);
// inserted many-many times
insert into tt values (default, 1);
insert into tt values (default, 2);
insert into tt values (default, 3);
insert into tt values (default, 4);
insert into tt values (default, 5);
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match