date:20050719

Re: [PERFORM] Insert performance (OT?)

2005-07-19 Thread Yves Vindevogel

nobody ?

On 18 Jul 2005, at 21:29, Yves Vindevogel wrote:

Hi,

Suppose I have a table with 4 fields (f1, f2, f3, f4)
I define 2 unique indexes u1 (f1, f2, f3) and u2 (f1, f2, f4)

I have 3 records
A, B, C, D (this will be inserted)
A, B, C, E (this will pass u2, but not u1, thus  not inserted)
A, B, F, D (this will pass u1, but not u2, thus not inserted)

Now, for performance ...

I have tables like this with 500.000 records where there's a new upload of approx. 20.000 records.
It is only now that we say index u2 to be necessary.  So, until now, I did something like insert into ... select f1, f2, f2, max(f4) group by f1, f2, f3
That is ok ... and also logically ok because of the data definition

I cannot do this with 2 group by's.  I tried this on paper and I'm not succeeding.

So, I must use a function that will check against u1 and u2, and then insert if it is ok.
I know that such a function is way slower that my insert query.

So, my question ...
How can I keep the same performance, but also with the new index in mind ???


Met vriendelijke groeten,
Bien à vous,
Kind regards,

Yves Vindevogel
Implements



Mail: [EMAIL PROTECTED]  - Mobile: +32 (478) 80 82 91

Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76

Web: http://www.implements.be

First they ignore you.  Then they laugh at you.  Then they fight you.  Then you win.
Mahatma Ghandi.---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Met vriendelijke groeten,
Bien à vous,
Kind regards,

Yves Vindevogel
Implements




Mail: [EMAIL PROTECTED]  - Mobile: +32 (478) 80 82 91

Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76

Web: http://www.implements.be

First they ignore you.  Then they laugh at you.  Then they fight you.  Then you win.
Mahatma Ghandi.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Insert performance (OT?)

2005-07-19 Thread Richard Huxton


Yves Vindevogel wrote:

Hi,

Suppose I have a table with 4 fields (f1, f2, f3, f4)
I define 2 unique indexes u1 (f1, f2, f3) and u2 (f1, f2, f4)

I have 3 records
A, B, C, D (this will be inserted)
A, B, C, E (this will pass u2, but not u1, thus  not inserted)
A, B, F, D (this will pass u1, but not u2, thus not inserted)


Are you saying you want to know whether they will be inserted before you 
try to do so?



Now, for performance ...

I have tables like this with 500.000 records where there's a new upload 
of approx. 20.000 records.
It is only now that we say index u2 to be necessary.  So, until now, I 
did something like insert into ... select f1, f2, f2, max(f4) group by 
f1, f2, f3

That is ok ... and also logically ok because of the data definition


I'm confused here - assuming you meant "select f1,f2,f3", then I don't 
see how you guarantee the row doesn't alredy exist.


I cannot do this with 2 group by's.  I tried this on paper and I'm not 
succeeding.


I don't see how you can have two group-by's, or what that would mean if 
you did.


So, I must use a function that will check against u1 and u2, and then 
insert if it is ok.

I know that such a function is way slower that my insert query.


So - you have a table, called something like "upload" with 20,000 rows 
and you'd like to know whether it is safe to insert them. Well, it's 
easy enough to identify which ones are duplicates.


SELECT * FROM upload JOIN main_table ON u1=f1 AND u2=f2 AND u3=f3;
SELECT * FROM upload JOIN main_table ON u1=f1 AND u2=f2 AND u3=f4;

Are you saying that deleting these rows and then inserting takes too long?

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend

Fwd: [PERFORM] Insert performance (OT?)

2005-07-19 Thread Yves Vindevogel

BTW: thank you for the idea

Begin forwarded message:

From: Yves Vindevogel <[EMAIL PROTECTED]>
Date: Tue 19 Jul 2005 12:20:34 CEST
To: Richard Huxton 
Subject: Re: [PERFORM] Insert performance (OT?)

On 19 Jul 2005, at 11:39, Richard Huxton wrote:

Yves Vindevogel wrote:
Hi,
Suppose I have a table with 4 fields (f1, f2, f3, f4)
I define 2 unique indexes u1 (f1, f2, f3) and u2 (f1, f2, f4)
I have 3 records
A, B, C, D (this will be inserted)
A, B, C, E (this will pass u2, but not u1, thus  not inserted)
A, B, F, D (this will pass u1, but not u2, thus not inserted)

Are you saying you want to know whether they will be inserted before you try to do so?

No, that is not an issue.  Problem is that when I use a big query with   "insert into .. select" and one record is wrong (like above) the complete insert query is abandonned.
Therefore, I must do it another way.  Or I must be able to say, insert them and dump the rest.

Now, for performance ...
I have tables like this with 500.000 records where there's a new upload of approx. 20.000 records.
It is only now that we say index u2 to be necessary.  So, until now, I did something like insert into ... select f1, f2, f2, max(f4) group by f1, f2, f3
That is ok ... and also logically ok because of the data definition

I'm confused here - assuming you meant "select f1,f2,f3", then I don't see how you guarantee the row doesn't alredy exist.

No, I meant it with max(f4) because my table has 4 fields.  And no, I can't guarantee that, that is exactly my problem.
But with the unique indexes, I'm certain that it will not get into my database

I cannot do this with 2 group by's.  I tried this on paper and I'm not succeeding.

I don't see how you can have two group-by's, or what that would mean if you did.

select from ( select from group by) as foo group by

So, I must use a function that will check against u1 and u2, and then insert if it is ok.
I know that such a function is way slower that my insert query.

So - you have a table, called something like "upload" with 20,000 rows and you'd like to know whether it is safe to insert them. Well, it's easy enough to identify which ones are duplicates.

SELECT * FROM upload JOIN main_table ON u1=f1 AND u2=f2 AND u3=f3;
SELECT * FROM upload JOIN main_table ON u1=f1 AND u2=f2 AND u3=f4;

That is a good idea.  I can delete the ones that would fail my first unique index this way, and then delete the ones that would fail my second unique index and then upload them.
Hmm, why did I not think of that myself.

Are you saying that deleting these rows and then inserting takes too long?

This goes very fast, but not with a function that checks each record one by one.

--
Richard Huxton
Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend

Met vriendelijke groeten,
Bien à vous,
Kind regards,

Yves Vindevogel
Implements

Mail: [EMAIL PROTECTED]  - Mobile: +32 (478) 80 82 91

Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76

Web: http://www.implements.be

First they ignore you.  Then they laugh at you.  Then they fight you.  Then you win.
Mahatma Ghandi.Met vriendelijke groeten,
Bien à vous,
Kind regards,

Yves Vindevogel
Implements

Mail: [EMAIL PROTECTED]  - Mobile: +32 (478) 80 82 91

Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76

Web: http://www.implements.be

First they ignore you.  Then they laugh at you.  Then they fight you.  Then you win.
Mahatma Ghandi.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Insert performance (OT?)

2005-07-19 Thread Richard Huxton


Yves Vindevogel wrote:
 >>> So, I must use a function that will check against u1 and u2, and then

insert if it is ok.
I know that such a function is way slower that my insert query.


So - you have a table, called something like "upload" with 20,000 rows 
and you'd like to know whether it is safe to insert them. Well, it's 
easy enough to identify which ones are duplicates.


SELECT * FROM upload JOIN main_table ON u1=f1 AND u2=f2 AND u3=f3;
SELECT * FROM upload JOIN main_table ON u1=f1 AND u2=f2 AND u3=f4;

That is a good idea.  I can delete the ones that would fail my first 
unique index this way, and then delete the ones that would fail my 
second unique index and then upload them.

Hmm, why did I not think of that myself.


I've spent a lot of time moving data from one system to another, usually 
having to clean it in the process. At 9pm on a Friday, you decide that 
on the next job you'll find an efficient way to do it :-)


Are you saying that deleting these rows and then inserting takes too 
long?


This goes very fast, but not with a function that checks each record one 
by one.


You could get away with one query if you converted them to left-joins:
INSERT INTO ...
SELECT * FROM upload LEFT JOIN ... WHERE f3 IS NULL
UNION
SELECT * FROM upload LEFT JOIN ... WHERE f4 IS NULL

The UNION will remove duplicates for you, but this might turn out to be 
slower than two separate queries.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] Insert performance (OT?)

2005-07-19 Thread Yves Vindevogel

I will use 2 queries.  They run within a function fnUpload(), so I'm going to keep it simple.


On 19 Jul 2005, at 12:51, Richard Huxton wrote:

Yves Vindevogel wrote:
>>> So, I must use a function that will check against u1 and u2, and then
insert if it is ok.
I know that such a function is way slower that my insert query.

So - you have a table, called something like "upload" with 20,000 rows and you'd like to know whether it is safe to insert them. Well, it's easy enough to identify which ones are duplicates.

SELECT * FROM upload JOIN main_table ON u1=f1 AND u2=f2 AND u3=f3;
SELECT * FROM upload JOIN main_table ON u1=f1 AND u2=f2 AND u3=f4;

That is a good idea.  I can delete the ones that would fail my first unique index this way, and then delete the ones that would fail my second unique index and then upload them.
Hmm, why did I not think of that myself.

I've spent a lot of time moving data from one system to another, usually having to clean it in the process. At 9pm on a Friday, you decide that on the next job you'll find an efficient way to do it :-)

Are you saying that deleting these rows and then inserting takes too long?

This goes very fast, but not with a function that checks each record one by one.

You could get away with one query if you converted them to left-joins:
INSERT INTO ...
SELECT * FROM upload LEFT JOIN ... WHERE f3 IS NULL
UNION
SELECT * FROM upload LEFT JOIN ... WHERE f4 IS NULL

The UNION will remove duplicates for you, but this might turn out to be slower than two separate queries.

--
Richard Huxton
Archonet Ltd


Met vriendelijke groeten,
Bien à vous,
Kind regards,

Yves Vindevogel
Implements




Mail: [EMAIL PROTECTED]  - Mobile: +32 (478) 80 82 91

Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76

Web: http://www.implements.be

First they ignore you.  Then they laugh at you.  Then they fight you.  Then you win.
Mahatma Ghandi.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Impact of checkpoint_segments under continual load conditions

2005-07-19 Thread Christopher Petrilli

On 7/18/05, Tom Lane <[EMAIL PROTECTED]> wrote:
> > The table has 15 columns, 5 indexes (character, inet and timestamp).
> > No foreign keys. The only other thing running on the machine was the
> > application actually DOING the benchmarking, written in Python
> > (psycopg), but it was, according to top, using less than 1% of the
> > CPU.  It was just talking through a pipe to a psql prompt to do the
> > COPY.
> 
> Sounds pretty plain-vanilla all right.
> 
> Are you in a position to try the same benchmark against CVS tip?
> (The nightly snapshot tarball would be plenty close enough.)  I'm
> just wondering if the old bgwriter behavior of locking down the
> bufmgr while it examined the ARC/2Q data structures is causing this...

Tom,

It looks like the CVS HEAD is definately "better," but not by a huge
amount.  The only difference is I wasn't run autovacuum in the
background (default settings), but I don't think this explains it. 
Here's a graph of the differences and density of behavior:

http://blog.amber.org/diagrams/pgsql_copy_803_cvs.png

I can provide the raw data.  Each COPY was 500 rows.  Note that fsync
is turned off here.  Maybe it'd be more stable with it turned on?

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Impact of checkpoint_segments under continual load conditions

2005-07-19 Thread Tom Lane

Christopher Petrilli <[EMAIL PROTECTED]> writes:
> Here's a graph of the differences and density of behavior:

> http://blog.amber.org/diagrams/pgsql_copy_803_cvs.png

> I can provide the raw data.

How about the complete test case?  There's something awfully odd going
on there, and I'd like to find out what.

> Note that fsync is turned off here.  Maybe it'd be more stable with it
> turned on?

Hard to say.  I was about to ask if you'd experimented with altering
configuration parameters such as shared_buffers or checkpoint_segments
to see if you can move the point of onset of slowdown.  I'm thinking
the behavioral change might be associated with running out of free
buffers or some such.  (Are you running these tests under a freshly-
started postmaster, or one that's been busy for awhile?)

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Impact of checkpoint_segments under continual load conditions

2005-07-19 Thread PFC

	What happens if, say at iteration 6000 (a bit after the mess starts), you  
pause it for a few minutes and resume. Will it restart with a plateau like  
at the beginning of the test ? or not ?
	What if, during this pause, you disconnect and reconnect, or restart the  
postmaster, or vacuum, or analyze ?

On 7/18/05, Tom Lane <[EMAIL PROTECTED]> wrote:

> The table has 15 columns, 5 indexes (character, inet and timestamp).
> No foreign keys. The only other thing running on the machine was the
> application actually DOING the benchmarking, written in Python
> (psycopg), but it was, according to top, using less than 1% of the
> CPU.  It was just talking through a pipe to a psql prompt to do the
> COPY.

Sounds pretty plain-vanilla all right.

Are you in a position to try the same benchmark against CVS tip?
(The nightly snapshot tarball would be plenty close enough.)  I'm
just wondering if the old bgwriter behavior of locking down the
bufmgr while it examined the ARC/2Q data structures is causing this...

Tom,

It looks like the CVS HEAD is definately "better," but not by a huge
amount.  The only difference is I wasn't run autovacuum in the
background (default settings), but I don't think this explains it.
Here's a graph of the differences and density of behavior:

http://blog.amber.org/diagrams/pgsql_copy_803_cvs.png

I can provide the raw data.  Each COPY was 500 rows.  Note that fsync
is turned off here.  Maybe it'd be more stable with it turned on?

Chris

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] Impact of checkpoint_segments under continual load conditions

2005-07-19 Thread Christopher Petrilli

On 7/19/05, PFC <[EMAIL PROTECTED]> wrote:
> 
> What happens if, say at iteration 6000 (a bit after the mess starts), 
> you
> pause it for a few minutes and resume. Will it restart with a plateau like
> at the beginning of the test ? or not ?

Not sure... my benchmark is designed to represent what the database
will do under "typical" circumstances, and unfortunately these are
typical for the application.  However, I can see about adding some
delays, though multiple minutes would be absurd in the application. 
Perhaps a 5-10 second day? Would that still be interesting?

> What if, during this pause, you disconnect and reconnect, or restart 
> the
> postmaster, or vacuum, or analyze ?

Well, I don't have the numbers any more, but restarting the postmaster
has no effect, other than the first few hundreds COPYs are worse than
anything (3-4x slower), but then it goes back to following the trend
line.  The data in the chart for v8.0.3 includes running pg_autovacuum
(5 minutes).

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Impact of checkpoint_segments under continual load conditions

2005-07-19 Thread Tom Lane

Christopher Petrilli <[EMAIL PROTECTED]> writes:
> On 7/19/05, PFC <[EMAIL PROTECTED]> wrote:
>> What happens if, say at iteration 6000 (a bit after the mess starts), you
>> pause it for a few minutes and resume. Will it restart with a plateau like
>> at the beginning of the test ? or not ?

> Not sure... my benchmark is designed to represent what the database
> will do under "typical" circumstances, and unfortunately these are
> typical for the application.  However, I can see about adding some
> delays, though multiple minutes would be absurd in the application. 
> Perhaps a 5-10 second day? Would that still be interesting?

I think PFC's question was not directed towards modeling your
application, but about helping us understand what is going wrong
(so we can fix it).  It seemed like a good idea to me.

> Well, I don't have the numbers any more, but restarting the postmaster
> has no effect, other than the first few hundreds COPYs are worse than
> anything (3-4x slower), but then it goes back to following the trend
> line.  The data in the chart for v8.0.3 includes running pg_autovacuum
> (5 minutes).

The startup transient probably corresponds to the extra I/O needed to
repopulate shared buffers with a useful subset of your indexes.  But
just to be perfectly clear: you tried this, and after the startup
transient it returned to the *original* trend line?  In particular,
the performance goes into the tank after about 5000 total iterations,
and not 5000 iterations after the postmaster restart?

I'm suddenly wondering if the performance dropoff corresponds to the
point where the indexes have grown large enough to not fit in shared
buffers anymore.  If I understand correctly, the 5000-iterations mark
corresponds to 2.5 million total rows in the table; with 5 indexes
you'd have 12.5 million index entries or probably a couple hundred MB
total.  If the insertion pattern is sufficiently random that the entire
index ranges are "hot" then you might not have enough RAM.

Again, experimenting with different values of shared_buffers seems like
a very worthwhile thing to do.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Impact of checkpoint_segments under continual load conditions

2005-07-19 Thread PFC





total.  If the insertion pattern is sufficiently random that the entire
index ranges are "hot" then you might not have enough RAM.


	Try doing the test dropping some of your indexes and see if it moves the  
number of iterations after which it becomes slow.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] Impact of checkpoint_segments under continual load conditions

2005-07-19 Thread PFC





I think PFC's question was not directed towards modeling your
application, but about helping us understand what is going wrong
(so we can fix  it).


	Exactly, I was wondering if this delay would allow things to get flushed,  
for instance, which would give information about the problem (if giving it  
a few minutes of rest resumed normal operation, it would mean that some  
buffer somewhere is getting filled faster than it can be flushed).


	So, go ahead with a few minutes even if it's unrealistic, that is not the  
point, you have to tweak it in various possible manners to understand the  
causes.


	And instead of a pause, why not just set the duration of your test to  
6000 iterations and run it two times without dropping the test table ?


	I'm going into wild guesses, but first you should want to know if the  
problem is because the table is big, or if it's something else. So you run  
the complete test, stopping a bit after it starts to make a mess, then  
instead of dumping the table and restarting the test anew, you leave it as  
it is, do something, then run a new test, but on this table which already  
has data.


'something' could be one of those :
	disconnect, reconnect (well you'll have to do that if you run the test  
twice anyway)

just wait
restart postgres
unmount and remount the volume with the logs/data on it
reboot the machine
analyze
vacuum
vacuum analyze
cluster
vacuum full
reindex
	defrag your files on disk (stopping postgres and copying the database  
from your disk to anotherone and back will do)

or even dump'n'reload the whole database

	I think useful information can be extracted that way. If one of these  
fixes your problem it'l give hints.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Impact of checkpoint_segments under continual load conditions

2005-07-19 Thread Christopher Petrilli

On 7/19/05, Tom Lane <[EMAIL PROTECTED]> wrote:
> Christopher Petrilli <[EMAIL PROTECTED]> writes: 
> > Not sure... my benchmark is designed to represent what the database
> > will do under "typical" circumstances, and unfortunately these are
> > typical for the application.  However, I can see about adding some
> > delays, though multiple minutes would be absurd in the application.
> > Perhaps a 5-10 second day? Would that still be interesting?
> 
> I think PFC's question was not directed towards modeling your
> application, but about helping us understand what is going wrong
> (so we can fix it).  It seemed like a good idea to me.

OK, I can modify the code to do that, and I will post it on the web.

> The startup transient probably corresponds to the extra I/O needed to
> repopulate shared buffers with a useful subset of your indexes.  But
> just to be perfectly clear: you tried this, and after the startup
> transient it returned to the *original* trend line?  In particular,
> the performance goes into the tank after about 5000 total iterations,
> and not 5000 iterations after the postmaster restart?

This is correct, the TOTAL is what matters, not the specific instance
count.  I did an earlier run with larger batch sizes, and it hit at a
similar row count, so it's definately row-count/size related.
 
> I'm suddenly wondering if the performance dropoff corresponds to the
> point where the indexes have grown large enough to not fit in shared
> buffers anymore.  If I understand correctly, the 5000-iterations mark
> corresponds to 2.5 million total rows in the table; with 5 indexes
> you'd have 12.5 million index entries or probably a couple hundred MB
> total.  If the insertion pattern is sufficiently random that the entire
> index ranges are "hot" then you might not have enough RAM.

This is entirely possible, currently:

shared_buffers = 1000   
work_mem = 65535
maintenance_work_mem = 16384
max_stack_depth = 2048  

> Again, experimenting with different values of shared_buffers seems like
> a very worthwhile thing to do.

I miss-understood shared_buffers then, as I thought work_mem was where
indexes were kept.  If this is where index manipulations happen, then
I can up it quite a bit.  The machine this is running on has 2GB of
RAM.

My concern isn't absolute performance, as this is not representative
hardware, but instead is the evenness of behavior.

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Impact of checkpoint_segments under continual load conditions

2005-07-19 Thread Christopher Petrilli

On 7/19/05, PFC <[EMAIL PROTECTED]> wrote:
> 
> 
> > I think PFC's question was not directed towards modeling your
> > application, but about helping us understand what is going wrong
> > (so we can fix  it).
> 
> Exactly, I was wondering if this delay would allow things to get 
> flushed,
> for instance, which would give information about the problem (if giving it
> a few minutes of rest resumed normal operation, it would mean that some
> buffer somewhere is getting filled faster than it can be flushed).
> 
> So, go ahead with a few minutes even if it's unrealistic, that is not 
> the
> point, you have to tweak it in various possible manners to understand the
> causes.

Totally understand, and appologize if I sounded dismissive.  I
definately appreciate the insight and input.
 
> And instead of a pause, why not just set the duration of your test to
> 6000 iterations and run it two times without dropping the test table ?

This I can do.   I'll probably set it for 5,000 for the first, and
then start the second.  In non-benchmark experience, however, this
didn't seem to make much difference.

> I'm going into wild guesses, but first you should want to know if the
> problem is because the table is big, or if it's something else. So you run
> the complete test, stopping a bit after it starts to make a mess, then
> instead of dumping the table and restarting the test anew, you leave it as
> it is, do something, then run a new test, but on this table which already
> has data.
> 
> 'something' could be one of those :
> disconnect, reconnect (well you'll have to do that if you run the test
> twice anyway)
> just wait
> restart postgres
> unmount and remount the volume with the logs/data on it
> reboot the machine
> analyze
> vacuum
> vacuum analyze
> cluster
> vacuum full
> reindex
> defrag your files on disk (stopping postgres and copying the database
>  from your disk to anotherone and back will do)
> or even dump'n'reload the whole database
> 
> I think useful information can be extracted that way. If one of these
> fixes your problem it'l give hints.
> 

This could take a while :-)

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Impact of checkpoint_segments under continual load conditions

2005-07-19 Thread Tom Lane

Christopher Petrilli <[EMAIL PROTECTED]> writes:
> On 7/19/05, Tom Lane <[EMAIL PROTECTED]> wrote:
>> I'm suddenly wondering if the performance dropoff corresponds to the
>> point where the indexes have grown large enough to not fit in shared
>> buffers anymore.  If I understand correctly, the 5000-iterations mark
>> corresponds to 2.5 million total rows in the table; with 5 indexes
>> you'd have 12.5 million index entries or probably a couple hundred MB
>> total.  If the insertion pattern is sufficiently random that the entire
>> index ranges are "hot" then you might not have enough RAM.

> This is entirely possible, currently:

> shared_buffers = 1000   

Ah-hah --- with that setting, you could be seeing shared-buffer
thrashing even if only a fraction of the total index ranges need to be
touched.  I'd try some runs with shared_buffers at 1, 5, 10.

You might also try strace'ing the backend and see if behavior changes
noticeably when the performance tanks.

FWIW I have seen similar behavior while playing with MySQL's sql-bench
test --- the default 1000 shared_buffers is not large enough to hold
the "hot" part of the indexes in some of their insertion tests, and so
performance tanks --- you can see this happening in strace because the
kernel request mix goes from almost all writes to a significant part
reads.  On a pure data insertion benchmark you'd like to see nothing
but writes.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS issue?

2005-07-19 Thread Robert Creager

On Mon, 18 Jul 2005 13:52:53 -0400
Tom Lane <[EMAIL PROTECTED]> wrote:

> 
> Start a fresh psql session and "SHOW vacuum_cost_delay" to verify what
> the active setting is.
> 

Alright.  Restarted the 803 database.  Cron based vacuum analyze is running
every 5 minutes.  vacuum_cost_delay is 0.  The problem showed up after about 1/2
hour of running.  I've got vacuum jobs stacked from the last 35 minutes, with 2
vacuums running at the same time.  CS is around 73k.

What do I do now?  I can bring the db back to normal and not run any cron based
vacuum to see if it still happens, but I suspect nothing will happen without the
vacuum.  I'll leave it in it's current semi-catatonic state as long as possible
in case there is something to look at?

Cheers,
Rob

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS issue?

2005-07-19 Thread Tom Lane

Robert Creager <[EMAIL PROTECTED]> writes:
> Alright.  Restarted the 803 database.  Cron based vacuum analyze is
> running every 5 minutes.  vacuum_cost_delay is 0.  The problem showed
> up after about 1/2 hour of running.  I've got vacuum jobs stacked from
> the last 35 minutes, with 2 vacuums running at the same time.  CS is
> around 73k.

Hmm, I hadn't thought about the possible impact of multiple concurrent
vacuums.  Is the problem caused by that, or has performance already gone
into the tank by the time the cron-driven vacuums are taking long enough
to overlap?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Impact of checkpoint_segments under continual load conditions

2005-07-19 Thread Christopher Petrilli

As I'm doing this, I'm noticing something *VERY* disturbing to me:

postmaster backend: 20.3% CPU
psql frontend: 61.2% CPU

WTF?  The only thing going through the front end is the COPY command,
and it's sent to the backend to read from a file?

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Impact of checkpoint_segments under continual load conditions

2005-07-19 Thread Tom Lane

Christopher Petrilli <[EMAIL PROTECTED]> writes:
> As I'm doing this, I'm noticing something *VERY* disturbing to me:
> postmaster backend: 20.3% CPU
> psql frontend: 61.2% CPU

> WTF?  The only thing going through the front end is the COPY command,
> and it's sent to the backend to read from a file?

Are you sure the backend is reading directly from the file, and not
through psql?  (\copy, or COPY FROM STDIN, would go through psql.)

But even so that seems awfully high, considering how little work psql
has to do compared to the backend.  Has anyone ever profiled psql doing
this sort of thing?  I know I've spent all my time looking at the
backend ...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Impact of checkpoint_segments under continual load conditions

2005-07-19 Thread Christopher Petrilli

On 7/19/05, Tom Lane <[EMAIL PROTECTED]> wrote:
> Christopher Petrilli <[EMAIL PROTECTED]> writes:
> > As I'm doing this, I'm noticing something *VERY* disturbing to me:
> > postmaster backend: 20.3% CPU
> > psql frontend: 61.2% CPU
> 
> > WTF?  The only thing going through the front end is the COPY command,
> > and it's sent to the backend to read from a file?
> 
> Are you sure the backend is reading directly from the file, and not
> through psql?  (\copy, or COPY FROM STDIN, would go through psql.)

The exact command is:

COPY test (columnlist...) FROM '/tmp/loadfile';
 
> But even so that seems awfully high, considering how little work psql
> has to do compared to the backend.  Has anyone ever profiled psql doing
> this sort of thing?  I know I've spent all my time looking at the
> backend ...

Linux 2.6, ext3, data=writeback

It's flipped now (stil lrunning), and it's 48% postmaster, 36% psql,
but anything more than 1-2% seems absurd.

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[PERFORM] Looking for tips

2005-07-19 Thread Oliver Crosby

Hi,
I'm running Postgres 7.4.6 on a dedicated server with about 1.5gigs of ram.
Running scripts locally, it takes about 1.5x longer than mysql, and the load on the server is only about 21%.
I upped the sort_mem to 8192 (kB), and shared_buffers and effective_cache_size to 65536 (512MB), but neither the timing nor the server load have changed at all. FYI, I'm going to be working on data sets in the order of GB.

 
I think I've gone about as far as I can with google.. can anybody give me some advice on how to improve the raw performance before I start looking at code changes?
 
Thanks in advance.

Re: [PERFORM] Looking for tips

2005-07-19 Thread Joshua D. Drake


Oliver Crosby wrote:

Hi,
I'm running Postgres 7.4.6 on a dedicated server with about 1.5gigs of ram.
Running scripts locally, it takes about 1.5x longer than mysql, and the 
load on the server is only about 21%.


What queries?
What is your structure?
Have you tried explain analyze?
How many rows in the table?
Which OS?
How are you testing the speed?
What type of RAID?



--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Looking for tips

2005-07-19 Thread John A Meinel


Oliver Crosby wrote:

Hi,
I'm running Postgres 7.4.6 on a dedicated server with about 1.5gigs of ram.
Running scripts locally, it takes about 1.5x longer than mysql, and the
load on the server is only about 21%.
I upped the sort_mem to 8192 (kB), and shared_buffers and
effective_cache_size to 65536 (512MB), but neither the timing nor the
server load have changed at all. FYI, I'm going to be working on data
sets in the order of GB.

I think I've gone about as far as I can with google.. can anybody give
me some advice on how to improve the raw performance before I start
looking at code changes?

Thanks in advance.


First, try to post in plain-text rather than html, it is easier to read. :)

Second, if you can determine what queries are running slow, post the
result of EXPLAIN ANALYZE on them, and we can try to help you tune
them/postgres to better effect.

Just a blanket question like this is hard to answer. Your new
shared_buffers are probably *way* too high. They should be at most
around 10% of ram. Since this is a dedicated server effective_cache_size
should be probably ~75% of ram, or close to 1.2GB.

There are quite a few things that you can tweak, so the more information
you can give, the more we can help.

For instance, if you are loading a lot of data into a table, if
possible, you want to use COPY not INSERT.
If you have a lot of indexes and are loading a significant portion, it
is sometimes faster to drop the indexes, COPY the data in, and then
rebuild the indexes.

For tables with a lot of inserts/updates, you need to watch out for
foreign key constraints. (Generally, you will want an index on both
sides of the foreign key. One is required, the other is recommended for
faster update/deletes).

John
=:->



signature.asc
Description: OpenPGP digital signature

Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS issue?

2005-07-19 Thread Robert Creager

On Tue, 19 Jul 2005 12:54:22 -0400
Tom Lane <[EMAIL PROTECTED]> wrote:

> Hmm, I hadn't thought about the possible impact of multiple concurrent
> vacuums.  Is the problem caused by that, or has performance already gone
> into the tank by the time the cron-driven vacuums are taking long enough
> to overlap?

Don't know just yet.  When I run the vacuums manually on a healthy system on
741, they take less than 30 seconds.  I've stopped the cron vacuum and canceled
all the outstanding vacuum processes, but the 803 is still struggling (1/2 hour
later).

I'll re-start the database, vacuum full analyze and restart the runs without the
cron vacuum running.

Cheers,
Rob

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Looking for tips

2005-07-19 Thread Richard Huxton


Oliver Crosby wrote:

Hi,
I'm running Postgres 7.4.6 on a dedicated server with about 1.5gigs of ram.
Running scripts locally, it takes about 1.5x longer than mysql, and the load 
on the server is only about 21%.


What scripts? What do they do?
Oh, and 7.4.8 is the latest release - worth upgrading for the fixes.

I upped the sort_mem to 8192 (kB), and shared_buffers and 
effective_cache_size to 65536 (512MB), but neither the timing nor the server 
load have changed at all.


Well, effective_cache_size is the amount of RAM being used by the OS to 
cache your files, so take a look at top/free and set it based on that 
(pick a steady load).


What sort_mem should be will obviously depend how much sorting you do.

Drop shared_buffers down to about 1 - 2 (at a guess)

You may find the following useful
 http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php
Read the Performance Tuning article, there is an updated one for version 
8 at:

 http://www.powerpostgresql.com/PerfList

> FYI, I'm going to be working on data sets in the

order of GB.


Fair enough.

 I think I've gone about as far as I can with google.. can anybody give me 
some advice on how to improve the raw performance before I start looking at 
code changes?


Identify what the problem is first of all. Some things to consider:
 - Are there particular queries giving you trouble?
 - Is your load mostly reads or mostly writes?
 - Do you have one user or 100?
 - Are you block-loading data efficiently where necessary?
 - Have you indexed both sides of your foreign-keys where sensible?
 - Are your disks being used effectively?
 - Are your statistics accurate/up to date?

Bear in mind that MySQL will probably be quicker for simple queries for 
one user and always will be. If you have multiple users running a mix of 
multi-table joins and updates then PG will have a chance to stretch its 
legs.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] Impact of checkpoint_segments under continual load conditions

2005-07-19 Thread Tom Lane

Christopher Petrilli <[EMAIL PROTECTED]> writes:
>> Are you sure the backend is reading directly from the file, and not
>> through psql?  (\copy, or COPY FROM STDIN, would go through psql.)

> The exact command is:
> COPY test (columnlist...) FROM '/tmp/loadfile';

I tried to replicate this by putting a ton of COPY commands like that
into a file and doing "psql -f file ...".  I don't see more than about
0.3% CPU going to psql.  So there's something funny about your test
conditions.  How *exactly* are you invoking psql?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Looking for tips

2005-07-19 Thread Oliver Crosby

I was hoping to start with tuning postgres to match the hardware, but
in any case..

The queries are all simple insert or select statements on single tables.
Eg. select x from table where y=?; or insert into table (a, b, c)
values (?, ?, ?);
In the case of selects where it's a large table, there's an index on
the column being searched, so in terms of the example above, x is
either a pkey column or other related field, and y is a non-pkey
column.

I'm not sure what you mean by structure.

I tried explain analyse on the individual queries, but I'm not sure
what can be done to manipulate them when they don't do much.

My test environment has about 100k - 300k rows in each table, and for
production I'm expecting this to be in the order of 1M+.

The OS is Redhat Enterprise 3.

I'm using a time command when I call the scripts to get a total
running time from start to finish.

I don't know what we have for RAID, but I suspect it's just a single
10k or 15k rpm hdd.

I'll try your recommendations for shared_buffers and
effective_cache_size. Thanks John!

We're trying to improve performance on a log processing script to the
point where it can be run as close as possible to realtime. A lot of
what gets inserted depends on what's already in the db, and it runs
item-by-item... so unfortunately I can't take advantage of copy.

We tried dropping indices, copying data in, then rebuilding. It works
great for a bulk import, but the processing script went a lot slower
without them. (Each insert is preceeded by a local cache check and
then a db search to see if an ID already exists for an item.)

We have no foreign keys at the moment. Would they help?

On 7/19/05, Joshua D. Drake <[EMAIL PROTECTED]> wrote:
> Oliver Crosby wrote:
> > Hi,
> > I'm running Postgres 7.4.6 on a dedicated server with about 1.5gigs of ram.
> > Running scripts locally, it takes about 1.5x longer than mysql, and the
> > load on the server is only about 21%.
> 
> What queries?
> What is your structure?
> Have you tried explain analyze?
> How many rows in the table?
> Which OS?
> How are you testing the speed?
> What type of RAID?
> 
> 
> 
> --
> Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
> PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
> Managed Services, Shared and Dedicated Hosting
> Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
>

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[PERFORM] context-switching issue on Xeon

2005-07-19 Thread Sailer, Denis (YBUSA-CDR)









The thread below has the test case that we were able to use
to reproduce the issue.

 

http://archives.postgresql.org/pgsql-performance/2004-04/msg00280.php

 

The last messages on this subject are from April of
2005.  Has there been any successful ways to significantly reduce the
impact this has to multi-processing?  I haven’t been able to find anything
showing a resolution of some kind.

 

We are seeing this on two of our machines:

 

Quad 3.0 GHz XEON with 3GB of memory running PG 7.4.3 with SuSE
kernel 2.4

 

Dual 2.8 GHz XEON with 2GB of memory running PG 8.0.0 with SuSE
kernel 2.4

Re: [PERFORM] Looking for tips

2005-07-19 Thread Oliver Crosby

> Identify what the problem is first of all. Some things to consider:
>  - Are there particular queries giving you trouble?
>  - Is your load mostly reads or mostly writes?
>  - Do you have one user or 100?
>  - Are you block-loading data efficiently where necessary?
>  - Have you indexed both sides of your foreign-keys where sensible?
>  - Are your disks being used effectively?
>  - Are your statistics accurate/up to date?

No queries in particular appear to be a problem. I think it's just the
overall speed. If any of the configuration settings will help make the
simple select queries go faster, that would be ideal.
The load is about 50/50 read/write.
At the moment it's just one user, but the goal is to have a cluster of
servers (probably less than a dozen) updating to a central db.
Indices exist for the fields being searched, but we don't have any foreign keys.
I'm not too familiar with effective disk usage or statistics...

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] context-switching issue on Xeon

2005-07-19 Thread David Hodgkinson



FWIW, I'm seeing this with a client at the moment. 40-60k CS per second
on Dual 3.2GHz.

There are plenty of other issues we're dealing with, but this is  
obviously

disconcerting...


On 19 Jul 2005, at 19:23, Sailer, Denis (YBUSA-CDR) wrote:

The thread below has the test case that we were able to use to  
reproduce the issue.



http://archives.postgresql.org/pgsql-performance/2004-04/msg00280.php


The last messages on this subject are from April of 2005.  Has  
there been any successful ways to significantly reduce the impact  
this has to multi-processing?  I haven’t been able to find anything  
showing a resolution of some kind.



We are seeing this on two of our machines:


Quad 3.0 GHz XEON with 3GB of memory running PG 7.4.3 with SuSE  
kernel 2.4



Dual 2.8 GHz XEON with 2GB of memory running PG 8.0.0 with SuSE  
kernel 2.4









---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Looking for tips

2005-07-19 Thread Oliver Crosby

> What programming language are these scripts written in ?

perl. using the DBD:Pg interface instead of command-lining it through psql

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] context-switching issue on Xeon

2005-07-19 Thread Tom Lane

"Sailer, Denis (YBUSA-CDR)" <[EMAIL PROTECTED]> writes:
> http://archives.postgresql.org/pgsql-performance/2004-04/msg00280.php

> The last messages on this subject are from April of 2005.  Has there
> been any successful ways to significantly reduce the impact this has to
> multi-processing?

CVS tip should largely fix the problem as far as buffer manager
contention goes.

> I haven't been able to find anything showing a
> resolution of some kind.

Look at the Feb/March threads concerning buffer manager rewrite, clock
sweep, etc ... eg
http://archives.postgresql.org/pgsql-patches/2005-03/msg00015.php

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Impact of checkpoint_segments under continual load conditions

2005-07-19 Thread Christopher Petrilli

On 7/19/05, Tom Lane <[EMAIL PROTECTED]> wrote:
> Christopher Petrilli <[EMAIL PROTECTED]> writes:
> >> Are you sure the backend is reading directly from the file, and not
> >> through psql?  (\copy, or COPY FROM STDIN, would go through psql.)
> 
> > The exact command is:
> > COPY test (columnlist...) FROM '/tmp/loadfile';
> 
> I tried to replicate this by putting a ton of COPY commands like that
> into a file and doing "psql -f file ...".  I don't see more than about
> 0.3% CPU going to psql.  So there's something funny about your test
> conditions.  How *exactly* are you invoking psql?

It is a subprocess of a Python process, driven using a pexpect
interchange. I send the COPY command, then wait for the '=#' to come
back.

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Looking for tips

2005-07-19 Thread PFC




What programming language are these scripts written in ?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] Impact of checkpoint_segments under continual load conditions

2005-07-19 Thread Tom Lane

Christopher Petrilli <[EMAIL PROTECTED]> writes:
> On 7/19/05, Tom Lane <[EMAIL PROTECTED]> wrote:
>> How *exactly* are you invoking psql?

> It is a subprocess of a Python process, driven using a pexpect
> interchange. I send the COPY command, then wait for the '=#' to come
> back.

Some weird interaction with pexpect maybe?  Try adding "-n" (disable
readline) to the psql command switches.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Looking for tips

2005-07-19 Thread Kevin Grittner

Hi Oliver,

We had low resource utilization and poor throughput on inserts of
thousands of rows within a single database transaction.  There were a
lot of configuration parameters we changed, but the one which helped the
most was wal_buffers -- we wound up setting it to 1000.  This may be
higher than it needs to be, but when we got to something which ran well,
we stopped tinkering.  The default value clearly caused a bottleneck.

You might find this page useful:

http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html

-Kevin

>>> Oliver Crosby <[EMAIL PROTECTED]> 07/19/05 1:21 PM >>>
I was hoping to start with tuning postgres to match the hardware, but
in any case..

The queries are all simple insert or select statements on single tables.
Eg. select x from table where y=?; or insert into table (a, b, c)
values (?, ?, ?);
In the case of selects where it's a large table, there's an index on
the column being searched, so in terms of the example above, x is
either a pkey column or other related field, and y is a non-pkey
column.

I'm not sure what you mean by structure.

I tried explain analyse on the individual queries, but I'm not sure
what can be done to manipulate them when they don't do much.

My test environment has about 100k - 300k rows in each table, and for
production I'm expecting this to be in the order of 1M+.

The OS is Redhat Enterprise 3.

I'm using a time command when I call the scripts to get a total
running time from start to finish.

I don't know what we have for RAID, but I suspect it's just a single
10k or 15k rpm hdd.

I'll try your recommendations for shared_buffers and
effective_cache_size. Thanks John!

We're trying to improve performance on a log processing script to the
point where it can be run as close as possible to realtime. A lot of
what gets inserted depends on what's already in the db, and it runs
item-by-item... so unfortunately I can't take advantage of copy.

We tried dropping indices, copying data in, then rebuilding. It works
great for a bulk import, but the processing script went a lot slower
without them. (Each insert is preceeded by a local cache check and
then a db search to see if an ID already exists for an item.)

We have no foreign keys at the moment. Would they help?

On 7/19/05, Joshua D. Drake <[EMAIL PROTECTED]> wrote:
> Oliver Crosby wrote:
> > Hi,
> > I'm running Postgres 7.4.6 on a dedicated server with about 1.5gigs
of ram.
> > Running scripts locally, it takes about 1.5x longer than mysql, and
the
> > load on the server is only about 21%.
> 
> What queries?
> What is your structure?
> Have you tried explain analyze?
> How many rows in the table?
> Which OS?
> How are you testing the speed?
> What type of RAID?
> 
> 
> 
> --
> Your PostgreSQL solutions company - Command Prompt, Inc.
1.800.492.2240
> PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
> Managed Services, Shared and Dedicated Hosting
> Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
>

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Looking for tips

2005-07-19 Thread Tom Lane

Oliver Crosby <[EMAIL PROTECTED]> writes:
> The queries are all simple insert or select statements on single tables.
> Eg. select x from table where y=?; or insert into table (a, b, c)
> values (?, ?, ?);
> In the case of selects where it's a large table, there's an index on
> the column being searched, so in terms of the example above, x is
> either a pkey column or other related field, and y is a non-pkey
> column.

If you're running only a single query at a time (no multiple clients),
then this is pretty much the definition of a MySQL-friendly workload;
I'd have to say we are doing really well if we are only 50% slower.
Postgres doesn't have any performance advantages until you get into
complex queries or a significant amount of concurrency.

You could possibly get some improvement if you can re-use prepared plans
for the queries; but this will require some fooling with the client code
(I'm not sure if DBD::Pg even has support for it at all).

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Looking for tips

2005-07-19 Thread Steinar H. Gunderson

On Tue, Jul 19, 2005 at 03:01:00PM -0400, Tom Lane wrote:
> You could possibly get some improvement if you can re-use prepared plans
> for the queries; but this will require some fooling with the client code
> (I'm not sure if DBD::Pg even has support for it at all).

Newer versions has, when compiled against the 8.0 client libraries and using
an 8.0 server (AFAIK).

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Impact of checkpoint_segments under continual load conditions

2005-07-19 Thread PFC




It is a subprocess of a Python process, driven using a pexpect
interchange. I send the COPY command, then wait for the '=#' to come
back.


did you try sending the COPY as a normal query through psycopg ?

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] Looking for tips

2005-07-19 Thread PFC



	I can't say wether MySQL is faster for very small queries (like  
SELECT'ing one row based on an indexed field).

That's why I was asking you about the language...

I assume you're using a persistent connection.
	For simple queries like this, PG 8.x seemed to be a lot faster than PG  
7.x. Have you tried 8 ?
	I was asking you which language, because for such really small queries  
you have to take into account the library overhead. For instance, in PHP a  
simple query can be 10 times slower in Postgres than in MySQL and I  
believe it is because php's MySQL driver has seen a lot of optimization  
whereas the postgres driver has not. Interestingly, the situation is  
reversed with Python : its best postgres driver (psycopg 2) is a lot  
faster than the MySQL adapter, and faster than both php adapters (a lot  
faster).


The same query can get (this is from the back of my head):

PHP+Postgres3-5 ms
Python+MySQL1ms
PHP+MySQL   0.5 ms
Python+Postgres 0.15 ms

And yes, I had queries executing in 150 microseconds or so, this includes  
time to convert the results to native python objects ! This was on a loop  
of 1 times the same query. But psycopg2 is fast. The overhead for  
parsing a simple query and fetching just a row is really small.


This is on my Centrino 1.6G laptop.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Looking for tips

2005-07-19 Thread Tom Lane

Oliver Crosby <[EMAIL PROTECTED]> writes:
>> You could possibly get some improvement if you can re-use prepared plans
>> for the queries; but this will require some fooling with the client code
>> (I'm not sure if DBD::Pg even has support for it at all).

> Aye. We have prepared statements.

Ah, but are they really prepared, or is DBD::Pg faking it by inserting
parameter values into the query text and then sending the assembled
string as a fresh query?  It wasn't until about 7.4 that we had adequate
backend support to let client libraries support prepared queries
properly, and I'm unsure that DBD::Pg has been updated to take advantage
of that support.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Looking for tips

2005-07-19 Thread Oliver Crosby

> If you're running only a single query at a time (no multiple clients),
> then this is pretty much the definition of a MySQL-friendly workload;
> I'd have to say we are doing really well if we are only 50% slower.
> Postgres doesn't have any performance advantages until you get into
> complex queries or a significant amount of concurrency.

The original port was actually twice as slow. It improved quite a bit
after I added transactions and trimmed the schema a bit.

> You could possibly get some improvement if you can re-use prepared plans
> for the queries; but this will require some fooling with the client code
> (I'm not sure if DBD::Pg even has support for it at all).

Aye. We have prepared statements.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Impact of checkpoint_segments under continual load conditions

2005-07-19 Thread Christopher Petrilli

On 7/19/05, Tom Lane <[EMAIL PROTECTED]> wrote:
> Christopher Petrilli <[EMAIL PROTECTED]> writes:
> > On 7/19/05, Tom Lane <[EMAIL PROTECTED]> wrote:
> >> How *exactly* are you invoking psql?
> 
> > It is a subprocess of a Python process, driven using a pexpect
> > interchange. I send the COPY command, then wait for the '=#' to come
> > back.
> 
> Some weird interaction with pexpect maybe?  Try adding "-n" (disable
> readline) to the psql command switches.

Um... WOW!

==> pgsql_benchmark_803_bigbuffers1_noreadline.txt <==
0   0.0319459438324 0.0263829231262
1   0.0303978919983 0.0263390541077
2   0.0306499004364 0.0273139476776
3   0.030695915 0.0270659923553
4   0.0307791233063 0.0278429985046
5   0.0306351184845 0.0278820991516
6   0.0307800769806 0.0335869789124
7   0.0408310890198 0.0370559692383
8   0.0371310710907 0.0344209671021
9   0.0372560024261 0.0334041118622

==> pgsql_benchmark_803_bigbuffers1.txt <==
0   0.0352520942688 0.149132013321
1   0.0320160388947 0.146126031876
2   0.0307128429413 0.139330863953
3   0.0306718349457 0.139590978622
4   0.0307030677795 0.140225172043
5   0.0306420326233 0.140012979507
6   0.0307261943817 0.139672994614
7   0.0307750701904 0.140661001205
8   0.0307800769806 0.141661167145
9   0.0306720733643 0.141198158264  

First column is iteration, second is "gen time" to generate the load
file, and 3rd is "load time".

It doesn't stay QUITE that low, but it stays lower... quite a bit. 
We'll see what happens over time.

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Looking for tips

2005-07-19 Thread Steinar H. Gunderson

On Tue, Jul 19, 2005 at 03:16:31PM -0400, Tom Lane wrote:
> Ah, but are they really prepared, or is DBD::Pg faking it by inserting
> parameter values into the query text and then sending the assembled
> string as a fresh query?  

They are really prepared.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS issue?

2005-07-19 Thread Robert Creager

On Tue, 19 Jul 2005 12:54:22 -0400
Tom Lane <[EMAIL PROTECTED]> wrote:

> Robert Creager <[EMAIL PROTECTED]> writes:
> 
> Hmm, I hadn't thought about the possible impact of multiple concurrent
> vacuums.  Is the problem caused by that, or has performance already gone
> into the tank by the time the cron-driven vacuums are taking long enough
> to overlap?

All statements over 5 seconds are logged.  Vacuums are running on the 5 minute 
mark.

Log file shows the first query starts going bad a 9:32:15 (7 seconds), although 
the second query start before the first .  The first vacuum statement logged 
shows 1148 seconds completing at 9:54:09, so starting at 9:35.  Looks like the 
vacuum is an innocent bystander of the problem.

The first problem queries are below.  Additionally, I've attached 5 minutes 
(bzipped) of logs starting at the first event below.

Jul 19 09:32:15 annette postgres[17029]: [2-1] LOG:  duration: 7146.168 ms  
statement:
Jul 19 09:32:15 annette postgres[17029]: [2-2] ^I  SELECT location_id, 
location_type.name AS type, library, rail
Jul 19 09:32:15 annette postgres[17029]: [2-3] ^I  FROM location_lock JOIN 
location USING( location_id )
Jul 19 09:32:15 annette postgres[17029]: [2-4] ^I JOIN 
location_type USING( location_type_id )
Jul 19 09:32:15 annette postgres[17029]: [2-5] ^I  WHERE test_session_id = 
'5264'
Jul 19 09:32:20 annette postgres[17092]: [2-1] LOG:  duration: 13389.730 ms  
statement:
Jul 19 09:32:20 annette postgres[17092]: [2-2] ^I  SELECT 
location_type.name AS location_type_name,
Jul 19 09:32:20 annette postgres[17092]: [2-3] ^I library, rail, 
col, side, row, location_id,
Jul 19 09:32:20 annette postgres[17092]: [2-4] ^I hli_lsm, 
hli_panel, hli_row, hli_col
Jul 19 09:32:20 annette postgres[17092]: [2-5] ^I  FROM location JOIN 
location_type USING( location_type_id )
Jul 19 09:32:20 annette postgres[17092]: [2-6] ^IJOIN 
complex USING( library_id )
Jul 19 09:32:20 annette postgres[17092]: [2-7] ^ILEFT OUTER 
JOIN hli_location USING( location_id )
Jul 19 09:32:20 annette postgres[17092]: [2-8] ^ILEFT OUTER 
JOIN application USING( application_id )
Jul 19 09:32:20 annette postgres[17092]: [2-9] ^I  WHERE  
complex.complex_id = '13'
Jul 19 09:32:20 annette postgres[17092]: [2-10] ^I AND location_id NOT 
IN
Jul 19 09:32:20 annette postgres[17092]: [2-11] ^I(SELECT 
location_id
Jul 19 09:32:20 annette postgres[17092]: [2-12] ^I FROM 
location_lock)
Jul 19 09:32:20 annette postgres[17092]: [2-13] ^I AND location_id NOT 
IN
Jul 19 09:32:20 annette postgres[17092]: [2-14] ^I(SELECT 
location_id
Jul 19 09:32:20 annette postgres[17092]: [2-15] ^I FROM 
cartridge)
Jul 19 09:32:20 annette postgres[17092]: [2-16] ^IAND (location_type.name ~ 
'cell' AND application.name ~ 'hli'  AND hli_lsm = 1 AND col BETWEEN -2 AND 2)
Jul 19 09:32:20 annette postgres[17092]: [2-17] ^I
Jul 19 09:32:20 annette postgres[17092]: [2-18] ^I ORDER BY 
location.usage_count, location.rand LIMIT 1
Jul 19 09:32:20 annette postgres[17092]: [2-19] ^I FOR UPDATE OF 
location

Cheers,
Rob



pg.log.bz2
Description: BZip2 compressed data

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Looking for tips

2005-07-19 Thread Jeff Trout



On Jul 19, 2005, at 3:36 PM, Steinar H. Gunderson wrote:


On Tue, Jul 19, 2005 at 03:16:31PM -0400, Tom Lane wrote:

Ah, but are they really prepared, or is DBD::Pg faking it by  
inserting

parameter values into the query text and then sending the assembled
string as a fresh query?



They are really prepared.


That depends on what version you are using. Older versions did what  
Tom mentioned rather than sending PREPARE & EXECUTE.


Not sure what version that changed in.

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] Looking for tips

2005-07-19 Thread Oliver Crosby

> We had low resource utilization and poor throughput on inserts of
> thousands of rows within a single database transaction.  There were a
> lot of configuration parameters we changed, but the one which helped the
> most was wal_buffers -- we wound up setting it to 1000.  This may be
> higher than it needs to be, but when we got to something which ran well,
> we stopped tinkering.  The default value clearly caused a bottleneck.

I just tried wal_buffers = 1000, sort_mem at 10% and
effective_cache_size at 75%.
The performance refuses to budge.. I guess that's as good as it'll go?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Looking for tips

2005-07-19 Thread Dawid Kuroczko

On 7/19/05, Oliver Crosby <[EMAIL PROTECTED]> wrote:
> > We had low resource utilization and poor throughput on inserts of
> > thousands of rows within a single database transaction.  There were a
> > lot of configuration parameters we changed, but the one which helped the
> > most was wal_buffers -- we wound up setting it to 1000.  This may be
> > higher than it needs to be, but when we got to something which ran well,
> > we stopped tinkering.  The default value clearly caused a bottleneck.
> 
> I just tried wal_buffers = 1000, sort_mem at 10% and
> effective_cache_size at 75%.
> The performance refuses to budge.. I guess that's as good as it'll go?

If it is possible try:
1) wrapping many inserts into one transaction
(BEGIN;INSERT;INSERT;...INSERT;COMMIT;).  As PostgreSQL will need to
handle less transactions per second (each your insert is a transaction), it
may work faster.

2) If you can do 1, you could go further and use a COPY command which is
the fastest way to bulk-load a database.

Sometimes I insert data info temporary table, and then do:
INSERT INTO sometable SELECT * FROM tmp_table;
(but I do it when I want to do some select, updates, etc on
the data before "commiting" them to main table; dropping
temporary table is much cheaper than vacuuming many-a-row
table).

  Regards,
 Dawid

PS: Where can I find benchmarks comparing PHP vs Perl vs Python in
terms of speed of executing prepared statements?

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Looking for tips

2005-07-19 Thread PFC




PS: Where can I find benchmarks comparing PHP vs Perl vs Python in
terms of speed of executing prepared statements?


I'm afraid you'll have to do these yourself !

	And, I don't think the Python drivers support real prepared statements  
(the speed of psycopy is really good though).
	I don't think PHP either ; they don't even provide a database interface  
to speak of (ie you have to build the query string by hand including  
quoting).



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Looking for tips

2005-07-19 Thread Oliver Crosby

> If it is possible try:
> 1) wrapping many inserts into one transaction
> (BEGIN;INSERT;INSERT;...INSERT;COMMIT;).  As PostgreSQL will need to
> handle less transactions per second (each your insert is a transaction), it
> may work faster.

Aye, that's what I have it doing right now. The transactions do save a
HUGE chunk of time. (Cuts it down by about 40%).

> 2) If you can do 1, you could go further and use a COPY command which is
> the fastest way to bulk-load a database.

I don't think I can use COPY in my case because I need to do
processing on a per-line basis, and I need to check if the item I want
to insert is already there, and if it is, I need to get it's ID so I
can use that for further processing.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Looking for tips

2005-07-19 Thread Christopher Weimann

On 07/19/2005-02:41PM, Oliver Crosby wrote:
> 
> No queries in particular appear to be a problem. 

That could mean they are ALL a problem.  Let see some EXPLAIN ANAYZE
results just to rule it out.

> At the moment it's just one user, 

With 1 user PostgreSQL will probobaly never beat MySQL
but with hundreds it will.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Looking for tips

2005-07-19 Thread Sven Willenberger

On Tue, 2005-07-19 at 16:28 -0400, Oliver Crosby wrote:
> > If it is possible try:
> > 1) wrapping many inserts into one transaction
> > (BEGIN;INSERT;INSERT;...INSERT;COMMIT;).  As PostgreSQL will need to
> > handle less transactions per second (each your insert is a transaction), it
> > may work faster.
> 
> Aye, that's what I have it doing right now. The transactions do save a
> HUGE chunk of time. (Cuts it down by about 40%).
> 
> > 2) If you can do 1, you could go further and use a COPY command which is
> > the fastest way to bulk-load a database.
> 
> I don't think I can use COPY in my case because I need to do
> processing on a per-line basis, and I need to check if the item I want
> to insert is already there, and if it is, I need to get it's ID so I
> can use that for further processing.
> 

since triggers work with COPY, you could probably write a trigger that
looks for this condition and does the ID processsing you need; you could
thereby enjoy the enormous speed gain resulting from COPY and maintain
your data continuity.

Sven


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Looking for tips

2005-07-19 Thread Oliver Crosby

> since triggers work with COPY, you could probably write a trigger that
> looks for this condition and does the ID processsing you need; you could
> thereby enjoy the enormous speed gain resulting from COPY and maintain
> your data continuity.

So... (bear with me here.. trying to make sense of this)..
With triggers there's a way I can do the parsing I need to on a log
file and react to completed events in non-sequential order (you can
ignore that part.. it's just how we piece together different related
events) and then have perl/DBD::Pg invoke a copy command (which, from
what I can tell, has to operate on a file...) and the copy command can
feed the ID I need back to perl so I can work with it...
If that doesn't hurt my brain, then I'm at least kinda confused...
Anyway. Heading home now. I'll think about this more tonight/tomorrow.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[PERFORM] performance decrease after reboot

2005-07-19 Thread John Mendenhall

I tuned a query last week to obtain acceptable performance.
Here is my recorded explain analyze results:

-
LOG:  duration: 826.505 ms  statement: explain analyze
SELECT
 c.id AS contact_id,
 sr.id AS sales_rep_id,
 LTRIM(RTRIM(sr.firstname || ' ' || sr.lastname)) AS sales_rep_name,
 p.id AS partner_id,
 p.company AS partner_company,
 coalesce(LTRIM(RTRIM(c.company)), LTRIM(RTRIM(c.firstname || ' ' || 
c.lastname)))
  AS contact_company,
 LTRIM(RTRIM(c.city || ' ' || c.state || ' ' || c.postalcode || ' ' || 
c.country))
  AS contact_location,
 c.phone AS contact_phone,
 c.email AS contact_email,
 co.name AS contact_country,
 TO_CHAR(c.request_status_last_modified, 'mm/dd/yy hh12:mi pm')
  AS request_status_last_modified,
 TO_CHAR(c.request_status_last_modified, 'mmddhh24miss')
  AS rqst_stat_last_mdfd_sortable,
 c.token_id
FROM
 sales_reps sr
  JOIN partners p ON (sr.id = p.sales_rep_id)
  JOIN contacts c ON (p.id = c.partner_id)
  JOIN countries co ON (LOWER(c.country) = LOWER(co.code))
  JOIN partner_classification pc ON (p.classification_id = pc.id AND 
pc.classification != 'Sales Rep')
WHERE
 c.lead_deleted IS NULL
   AND EXISTS
  (
   SELECT
lr.id
   FROM
lead_requests lr,
lead_request_status lrs
   WHERE
c.id = lr.contact_id AND
lr.status_id = lrs.id  AND
lrs.is_closed = 0
  )
ORDER BY
 contact_company, contact_id
QUERY PLAN
---
 Sort  (cost=18266.77..18266.80 rows=11 width=219) (actual 
time=795.502..795.763 rows=246 loops=1)
   Sort Key: COALESCE(ltrim(rtrim((c.company)::text)), 
ltrim(rtrimc.firstname)::text || ' '::text) || (c.lastname)::text, c.id
   ->  Hash Join  (cost=18258.48..18266.58 rows=11 width=219) (actual 
time=747.551..788.095 rows=246 loops=1)
 Hash Cond: (lower(("outer".code)::text) = 
lower(("inner".country)::text))
 ->  Seq Scan on countries co  (cost=0.00..4.42 rows=242 width=19) 
(actual time=0.040..2.128 rows=242 loops=1)
 ->  Hash  (cost=18258.45..18258.45 rows=9 width=206) (actual 
time=746.653..746.653 rows=0 loops=1)
   ->  Merge Join  (cost=18258.12..18258.45 rows=9 width=206) 
(actual time=729.412..743.691 rows=246 loops=1)
 Merge Cond: ("outer".sales_rep_id = "inner".id)
 ->  Sort  (cost=18255.70..18255.73 rows=9 width=185) 
(actual time=727.948..728.274 rows=249 loops=1)
   Sort Key: p.sales_rep_id
   ->  Merge Join  (cost=18255.39..18255.56 rows=9 
width=185) (actual time=712.747..723.095 rows=249 loops=1)
 Merge Cond: ("outer".id = 
"inner".classification_id)
 ->  Sort  (cost=1.05..1.05 rows=2 width=10) 
(actual time=0.192..0.195 rows=2 loops=1)
   Sort Key: pc.id
   ->  Seq Scan on partner_classification 
pc  (cost=0.00..1.04 rows=2 width=10) (actual time=0.100..0.142 rows=2 loops=1)
 Filter: ((classification)::text <> 
'Sales Rep'::text)
 ->  Sort  (cost=18254.35..18254.38 rows=13 
width=195) (actual time=712.401..712.675 rows=250 loops=1)
   Sort Key: p.classification_id
   ->  Merge Join  (cost=0.00..18254.11 
rows=13 width=195) (actual time=47.844..705.517 rows=448 loops=1)
 Merge Cond: ("outer".id = 
"inner".partner_id)
 ->  Index Scan using partners_pkey 
on partners p  (cost=0.00..30.80 rows=395 width=53) (actual time=0.066..5.746 
rows=395 loops=1)
 ->  Index Scan using 
contacts_partner_id_idx on contacts c  (cost=0.00..130358.50 rows=93 width=152) 
(actual time=0.351..662.576 rows=452 loops=1)
   Filter: ((lead_deleted IS 
NULL) AND (subplan))
   SubPlan
 ->  Nested Loop  
(cost=0.00..6.76 rows=2 width=10) (actual time=0.094..0.094 rows=0 loops=5573)
   Join Filter: 
("outer".status_id = "inner".id)
   ->  Index Scan using 
lead_requests_contact_id_idx on lead_requests lr (cost=0.00..4.23 rows=2 
width=20) (actual time=0.068..0.069 rows=0 loops=5573)
 Index Cond: 
($0 = contact_id)
   ->  Seq Scan on 
lead_request_status lrs  (cost=0.00

Re: [PERFORM] Looking for tips

2005-07-19 Thread Sven Willenberger

On Tue, 2005-07-19 at 17:04 -0400, Oliver Crosby wrote:
> > since triggers work with COPY, you could probably write a trigger that
> > looks for this condition and does the ID processsing you need; you could
> > thereby enjoy the enormous speed gain resulting from COPY and maintain
> > your data continuity.
> 
> So... (bear with me here.. trying to make sense of this)..
> With triggers there's a way I can do the parsing I need to on a log
> file and react to completed events in non-sequential order (you can
> ignore that part.. it's just how we piece together different related
> events) and then have perl/DBD::Pg invoke a copy command (which, from
> what I can tell, has to operate on a file...) and the copy command can
> feed the ID I need back to perl so I can work with it...
> If that doesn't hurt my brain, then I'm at least kinda confused...
> Anyway. Heading home now. I'll think about this more tonight/tomorrow.
> 

Well without knowing the specifics of what you are actually trying to
accomplish I cannot say yes or no to your question. I am not sure from
where this data is coming that you are inserting into the db. However,
if the scenario is this: a) attempt to insert a row b) if row exists
already, grab the ID and do other db selects/inserts/deletes based on
that ID, then there is no need to feed this information back to the
perlscript. Is your perlscript parsing a file and then using the parsed
information to insert rows? If so, how is the ID that is returned used?
Can you have the trigger use the ID that may be returned to perform
whatever it is that your perlscript is trying to accomplish with that
ID?

It's all kind of vague so my answers may or may not help, but based on
the [lack of] specifics you have provided, I fear that is the best
suggestion that I can offer at this point.

Sven

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] join and query planner

2005-07-19 Thread Dario

I'll try that.

Let you know as soon as I can take a look.


Thank you-

-Mensaje original-
De: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] nombre de Kevin
Grittner
Enviado el: lunes, 18 de julio de 2005 17:48
Para: pgsql-performance@postgresql.org; [EMAIL PROTECTED]
Asunto: Re: [PERFORM] join and query planner


You might want to set join_collapse_limit high, and use the JOIN
operators rather than the comma-separated lists.  We generate the WHERE
clause on the fly, based on user input, and this has worked well for us.
 
-Kevin
 
 
>>> "Dario" <[EMAIL PROTECTED]> 07/18/05 2:24 PM >>>
Hi.

> Just out of curiosity, does it do any better with the following?
>
>SELECT ...

Yes, it does.

But my query could also be
SELECT ...
 FROM a
 JOIN b ON (a.key = b.key)
 LEFT JOIN c ON (c.key = a.key)
 LEFT JOIN d ON (d.key=a.key)
/*new*/ , e
 WHERE (b.column <= 100)
/*new*/  and (e.key = a.key) and (e.field = 'filter')

because it's constructed by an application. I needed to know if,
somehow,
someway, I can "unforce" join order.
The only way to solve it so far is changing application. It must build
something like

SELECT ...
 FROM b
 JOIN (a JOIN e ON (e.key = a.key)) ON (a.key = b.key)
 LEFT JOIN c ON (c.key = a.key)
 LEFT JOIN d ON (d.key=a.key)
 WHERE (b.column <= 100) and (e.field = 'filter')

Supossed that e.field has (should have) better selectivity. But now this
problem belongs to programmer's group :-)

The query, in fact, has more tables to join. I wonder if lowering geqo
threshold could do the work...

Thank you. Greetings. Long life, little spam and prosperity!


-Mensaje original-
De: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] nombre de Kevin
Grittner
Enviado el: lunes, 18 de julio de 2005 14:58
Para: pgsql-performance@postgresql.org; [EMAIL PROTECTED]
Asunto: Re: [PERFORM] join and query planner


Just out of curiosity, does it do any better with the following?

SELECT ...
 FROM a
 JOIN b ON (a.key = b.key)
 LEFT JOIN c ON (c.key = a.key)
 LEFT JOIN d ON (d.key=a.key)
 WHERE (b.column <= 100)


>>> snipp


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Looking for tips

2005-07-19 Thread PFC



	You could have a program pre-parse your log and put it in a format  
understandable by COPY, then load it in a temporary table and write a part  
of your application simply as a plpgsql function, reading from this table  
and doing queries (or a plperl function)...



So... (bear with me here.. trying to make sense of this)..
With triggers there's a way I can do the parsing I need to on a log
file and react to completed events in non-sequential order (you can
ignore that part.. it's just how we piece together different related
events) and then have perl/DBD::Pg invoke a copy command (which, from
what I can tell, has to operate on a file...) and the copy command can
feed the ID I need back to perl so I can work with it...
If that doesn't hurt my brain, then I'm at least kinda confused...
Anyway. Heading home now. I'll think about this more tonight/tomorrow.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] Looking for tips

2005-07-19 Thread Oliver Crosby

Sorry for the lack of specifics...

We have a file generated as a list of events, one per line. Suppose
lines 1,2,3,5,7,11,etc were related, then the last one would specify
that it's the last event. Gradually this gets assembled by a perl
script and when the last event is encountered, it gets inserted into
the db. For a given table, let's say it's of the form (a,b,c) where
'a' is a pkey, 'b' is indexed, and 'c' is other related information.
The most common 'b' values are cached locally with the perl script to
save us having to query the db. So what we end up having is:

if 'b' exists in cache, use cached 'a' value and continue
else if 'b' exists in the db, use the associated 'a' value and continue
else add a new line with 'b', return the new 'a' and continue

The local cache was a huge time saver with mysql. I've tried making a
plpgsql function that handles everything in one step on the db side,
but it didn't show any improvement. Time permitting, I'll try some new
approaches with changing the scripts and queries, though right now I
was just hoping to tune postgresql.conf to work better with the
hardware available.

Thanks to everyone for your help. Very much appreciated.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS

2005-07-19 Thread Robert Creager

When grilled further on (Tue, 19 Jul 2005 12:09:51 -0600),
Robert Creager <[EMAIL PROTECTED]> confessed:

> On Tue, 19 Jul 2005 12:54:22 -0400
> Tom Lane <[EMAIL PROTECTED]> wrote:
> 
> > Hmm, I hadn't thought about the possible impact of multiple concurrent
> > vacuums.  Is the problem caused by that, or has performance already gone
> > into the tank by the time the cron-driven vacuums are taking long enough
> > to overlap?
> 
> 
> I'll re-start the database, vacuum full analyze and restart the runs without
the
> cron vacuum running.
> 

It took a few hours, but the problem did finally occur with no vacuum running on
803.  CS is averaging 72k.  I cannot quantitatively say it took longer to
reproduce than with the vacuums running, but it seemed like it did.

Can any information be gotten out of this?  Should I try CVS HEAD?

Thoughts?

Thanks,
Rob

-- 
 22:41:36 up 6 days,  2:16,  6 users,  load average: 0.15, 0.21, 0.30
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

pgpmXr0G8ggMk.pgp
Description: PGP signature

59 matches

Mail list logo