Re: [PERFORM] Insert performance

2007-03-06 Thread Richard Huxton

joël Winteregg wrote:

Hi Richard,


Here is my problem. With some heavy insert into a simple BD (one
table, no indexes) i can't get better perf than 8000 inserts/sec. I'm
testing it using a simple C software which use libpq and which use:
- Insert prepared statement (to avoid too many request parsing on the
server)
- transaction of 10 inserts

Are each of the INSERTs in their own transaction?



No, as said above transactions are made of 10 inserts...


Hmm - I read that as just meaning "inserted 10 rows". You might find 
that smaller batches provide peak performance.



If so, you'll be limited by the speed of the disk the WAL is running on.

That means you have two main options:
1. Have multiple connections inserting simultaneously.


Yes, you're right. That what i have been testing and what provide the
best performance ! I saw that postgresql frontend was using a lot of CPU
and not both of them (i'm using a pentium D, dual core). To the opposit,
the postmaster process use not much resources. Using several client,
both CPU are used and i saw an increase of performance (about 18000
inserts/sec).

So i think my bottle neck is more the CPU speed than the disk speed,
what do you think ?


Well, I think it's fair to say it's not disk. Let's see - the original 
figure was 8000 inserts/sec, which is 0.125ms per insert. That sounds 
plausible to me for a round-trip to process a simple command - are you 
running the client app on the same machine, or is it over the network?


Two other things to bear in mind:
1. If you're running 8.2 you can have multiple sets of values in an INSERT
http://www.postgresql.org/docs/8.2/static/sql-insert.html

2. You can do a COPY from libpq - is it really not possible?

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Insert performance

2007-03-06 Thread Richard Huxton

joël Winteregg wrote:



No, as said above transactions are made of 10 inserts...
Hmm - I read that as just meaning "inserted 10 rows". You might find 
that smaller batches provide peak performance.


Ahh ok ;-) sorry for my bad english... (yeah, i have been testing
several transaction size 1, 2 and 10)


Not your bad English, my poor reading :-)


If so, you'll be limited by the speed of the disk the WAL is running on.

That means you have two main options:
1. Have multiple connections inserting simultaneously.

Yes, you're right. That what i have been testing and what provide the
best performance ! I saw that postgresql frontend was using a lot of CPU
and not both of them (i'm using a pentium D, dual core). To the opposit,
the postmaster process use not much resources. Using several client,
both CPU are used and i saw an increase of performance (about 18000
inserts/sec).

So i think my bottle neck is more the CPU speed than the disk speed,
what do you think ?
Well, I think it's fair to say it's not disk. Let's see - the original 
figure was 8000 inserts/sec, which is 0.125ms per insert. That sounds 
plausible to me for a round-trip to process a simple command - are you 
running the client app on the same machine, or is it over the network?


I did both test. On the local machine (using UNIX sockets) i can reach
18000 insert/sec with 10 clients and prepared statements. The same test
using clients on the remote machine provide me 13000 inserts/sec.


OK, so we know what the overhead for network connections is.


Now, with multiple client (multi-threaded inserts) my both CPU are quite
well used (both arround 90%) so i maybe think that disk speeds are now
my bottleneck. What do you think ?  or maybe i will need a better CPU ?


Two other things to bear in mind:
1. If you're running 8.2 you can have multiple sets of values in an INSERT
http://www.postgresql.org/docs/8.2/static/sql-insert.html


Yeah, i'm running the 8.2.3 version ! i didn't know about multiple
inserts sets ! Thanks for the tip ;-)


Ah-ha! Give it a go, it's designed for this sort of situation. Not sure 
it'll manage thousands of value clauses, but working up from 10 perhaps. 
I've not tested it for performance, so I'd be interesting in knowing how 
it compares to your other results.



2. You can do a COPY from libpq - is it really not possible?



Not really but i have been testing it and inserts are flying (about
10 inserts/sec) !!


What's the problem with the COPY? Could you COPY into one table then 
insert from that to your target table?


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Insert performance

2007-03-06 Thread Andreas Kostyrka
* Richard Huxton  [070306 12:22]:
> >>2. You can do a COPY from libpq - is it really not possible?
> >>
> >Not really but i have been testing it and inserts are flying (about
> >10 inserts/sec) !!
> 
> What's the problem with the COPY? Could you COPY into one table then insert 
> from that to your target table?
Well, there are some issues. First your client needs to support it.
E.g. psycopg2 supports only some specific CSV formatting in it's
methods. (plus I had sometimes random psycopg2 crashes, but guarding against
these is cheap compared to the speedup from COPY versus INSERT)
Plus you need to be sure that your data will apply cleanly (which in
my app was not the case), or you need to code a fallback that
localizes the row that doesn't work.

And the worst thing is, that it ignores RULES on the tables, which
sucks if you use them ;) (e.g. table partitioning).

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Insert performance

2007-03-06 Thread Richard Huxton

Andreas Kostyrka wrote:

* Richard Huxton  [070306 12:22]:

2. You can do a COPY from libpq - is it really not possible?


Not really but i have been testing it and inserts are flying (about
10 inserts/sec) !!

What's the problem with the COPY? Could you COPY into one table then insert 
from that to your target table?

Well, there are some issues. First your client needs to support it.
E.g. psycopg2 supports only some specific CSV formatting in it's
methods. (plus I had sometimes random psycopg2 crashes, but guarding against
these is cheap compared to the speedup from COPY versus INSERT)
Plus you need to be sure that your data will apply cleanly (which in
my app was not the case), or you need to code a fallback that
localizes the row that doesn't work.

And the worst thing is, that it ignores RULES on the tables, which
sucks if you use them ;) (e.g. table partitioning).


Ah, but two things deal with these issues:
1. Joel is using libpq
2. COPY into a holding table, tidy data and INSERT ... SELECT

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Insert performance

2007-03-06 Thread Andreas Kostyrka
* Richard Huxton  [070306 13:47]:
> Andreas Kostyrka wrote:
> >* Richard Huxton  [070306 12:22]:
> 2. You can do a COPY from libpq - is it really not possible?
> 
> >>>Not really but i have been testing it and inserts are flying (about
> >>>10 inserts/sec) !!
> >>What's the problem with the COPY? Could you COPY into one table then insert 
> >>from that to your target table?
> >Well, there are some issues. First your client needs to support it.
> >E.g. psycopg2 supports only some specific CSV formatting in it's
> >methods. (plus I had sometimes random psycopg2 crashes, but guarding against
> >these is cheap compared to the speedup from COPY versus INSERT)
> >Plus you need to be sure that your data will apply cleanly (which in
> >my app was not the case), or you need to code a fallback that
> >localizes the row that doesn't work.
> >And the worst thing is, that it ignores RULES on the tables, which
> >sucks if you use them ;) (e.g. table partitioning).
> 
> Ah, but two things deal with these issues:
> 1. Joel is using libpq
> 2. COPY into a holding table, tidy data and INSERT ... SELECT

Clearly COPY is the way for bulk loading data, BUT you asked, so I
wanted to point out some problems and brittle points with COPY.

(and the copy into the holding table doesn't solve completly the
problem with the dirty inconsistent data)

Andreas

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Estimate the size of the SQL file generated by pg_dump utility

2007-03-06 Thread Bill Moran
In response to Bruce Momjian <[EMAIL PROTECTED]>:

> Bricklen Anderson wrote:
> > Bruce Momjian wrote:
> > > Ravindran G-TLS,Chennai. wrote:
> > >> Note: Please bear with us for the disclaimer because it is automated in
> > >> the exchange server.
> > >> Regards, 
> > >> Ravi
> > > 
> > > FYI, we are getting closer to rejecting any email with such a
> > > disclaimer, or emailing you back every time saying we are ignoring the
> > > disclaimer.
> > 
> > I think this issue cropped up a year or two ago, and one of the 
> > suggestions was for the offender to simply put a link back to their 
> > disclaimer at the foot of their email, rather than that uber-verbose 
> > message.
> 
> Right.  The problem is that most of the posters have no control over
> their footers --- it is added by their email software.

I'm curious, what problem does the disclaimer cause?

I wrote the following TOS for my personal system:
https://www.potentialtech.com/cms/node/9
Excerpt of the relevant part:
"If you send me email, you are granting me the unrestricted right to use
the contents of that email however I see fit, unless otherwise agreed in
writing beforehand. You have no rights to the privacy of any email that you
send me. If I feel the need, I will forward emails to authorities or make
their contents publicly available. By sending me email you consent to this
policy and agree that it overrides any disclaimers or policies that may
exist elsewhere."

I have no idea if that's legally binding or not, but I've talked to a few
associates who have some experience in law, and they all argue that email
disclaimers probably aren't legally binding anyway -- so the result is
undefined.

Don't know if this addresses the issue or confuses it ... ?

-- 
Bill Moran
Collaborative Fusion Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Estimate the size of the SQL file generated by pg_dump utility

2007-03-06 Thread Craig A. James

Bill Moran wrote:

I'm curious, what problem does the disclaimer cause?

I wrote the following TOS for my personal system:
https://www.potentialtech.com/cms/node/9
Excerpt of the relevant part:
"If you send me email, you are granting me the unrestricted right to use
the contents of that email however I see fit, unless otherwise agreed in
writing beforehand. You have no rights to the privacy of any email that you
send me. If I feel the need, I will forward emails to authorities or make
their contents publicly available. By sending me email you consent to this
policy and agree that it overrides any disclaimers or policies that may
exist elsewhere."

I have no idea if that's legally binding or not, but I've talked to a few
associates who have some experience in law, and they all argue that email
disclaimers probably aren't legally binding anyway -- so the result is
undefined.


No, it's not legally binding.  Agreements are only binding if both parties 
agree, and someone sending you email has not consented to your statement.  If I 
send you something with a copyright mark, you'd better respect it unless you 
have a signed agreement granting you rights.  Federal law always wins.

Disclaimers are bad for two reasons.  First, they're powerless.  Just because Acme Corp. 
attaches a disclaimer doesn't mean they've absolved themselves of responsibility for the 
actions of their employees.  Second, they're insulting to the employees.  It's a big red 
flag saying, "We, Acme Corp., hire clowns we don't trust, and THIS person may be one 
of them!"

Craig

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Estimate the size of the SQL file generated by pg_dump utility

2007-03-06 Thread Claus Guttesen

> I'm curious, what problem does the disclaimer cause?
>
> I wrote the following TOS for my personal system:
> https://www.potentialtech.com/cms/node/9
> Excerpt of the relevant part:
> I have no idea if that's legally binding or not, but I've talked to a few
> associates who have some experience in law, and they all argue that email
> disclaimers probably aren't legally binding anyway -- so the result is
> undefined.

No, it's not legally binding.  Agreements are only binding if both parties 
agree, and someone sending you email has not consented to your statement.  If I 
send you something with a copyright mark, you'd better respect it unless you 
have a signed agreement granting you rights.  Federal law always wins.

Disclaimers are bad for two reasons.  First, they're powerless.  Just because Acme Corp. 
attaches a disclaimer doesn't mean they've absolved themselves of responsibility for the 
actions of their employees.  Second, they're insulting to the employees.  It's a big red 
flag saying, "We, Acme Corp., hire clowns we don't trust, and THIS person may be one 
of them!"


Dear sirs, this is off-topic at best. Pls. discontinue this thread.

regards
Claus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-06 Thread Alex Deucher

On 3/5/07, Guido Neitzer <[EMAIL PROTECTED]> wrote:

On 05.03.2007, at 19:56, Alex Deucher wrote:

> Yes, I started setting that up this afternoon.  I'm going to test that
> tomorrow and post the results.

Good - that may or may not give some insight in the actual
bottleneck. You never know but it seems to be one of the easiest to
find out ...



Well, the SAN appears to be the limiting factor.  I set up the DB on
the local scsi discs (software RAID 1) and performance is excellent
(better than the old server).  Thanks for everyone's help.

Alex

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Estimate the size of the SQL file generated by pg_dump utility

2007-03-06 Thread Tom Lane
"Craig A. James" <[EMAIL PROTECTED]> writes:
> Bill Moran wrote:
>> I have no idea if that's legally binding or not, but I've talked to a few
>> associates who have some experience in law, and they all argue that email
>> disclaimers probably aren't legally binding anyway -- so the result is
>> undefined.

> No, it's not legally binding.  Agreements are only binding if both
> parties agree, and someone sending you email has not consented to your
> statement.

To take this back to the PG problem: it's probably true that we can
ignore disclaimers as far as receiving, redistributing, and archiving
mail list submissions goes.  On the other hand, accepting a patch is
another matter.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM]

2007-03-06 Thread Jeff Cole


On Mar 5, 2007, at 8:54 PM, Tom Lane wrote:


Hm, the cost for the upper nestloop is way less than you would expect
given that the HASH IN join is going to have to be repeated 100+  
times.

I think this must be due to a very low "join_in_selectivity" estimate
but I'm not sure why you are getting that, especially seeing that the
rowcount estimates aren't far off.  Can you show us the pg_stats
rows for symptoms.id and symptom_reports.symptom_id?



Hi Tom, thanks for the response.  Here are the pg_stats.  I think I  
understand what the stats say, but I don't know what to conclude from  
them.



plm_stage=# select * from pg_stats where tablename = 'symptoms' and  
attname = 'id';
schemaname | tablename | attname | null_frac | avg_width | n_distinct  
| most_common_vals | most_common_freqs |   
histogram_bounds   | correlation
+---+-+---+--- 
++--+--- 
+-+-
public | symptoms  | id  | 0 | 4 | -1  
|  |   |  
{1,11,24,34,46,57,71,85,95,106,117} |0.451606



plm_stage=# select * from pg_stats where tablename =  
'symptom_reports' and attname = 'symptom_id';
schemaname |tablename|  attname   | null_frac | avg_width |  
n_distinct |most_common_vals 
|   
most_common_freqs   |   
histogram_bounds   | correlation
+-++---+--- 
++ 
+--- 
---+-+-
public | symptom_reports | symptom_id | 0 | 4  
| 80 | {3,2,4,1,5,8,9,7,10,6} |  
{0.094,0.093,0.093,0.092,0.091,0.090,0.087,0.084 
,0.084,0.08} | {12,18,24,30,38,44,51,57,91,91,114} |   0.0955925




And Ismo, I followed your suggestion to re-write the SQL more  
cleanly, and you are right it was faster, so that is certainly a  
solution.  Although I am still curious why my original query slowed  
down after the vacuum analyze.  In any case, here is the explain  
analyze from the new query.   Compare that to the  3441.452 ms of the  
old query after the analyze (and 134.045 ms before the analyze):


plm_stage=# explain analyze SELECT count(distinct s.id) AS count_all  
FROM symptoms s ,symptom_reports sr,users u WHERE s.id=sr.symptom_id  
and sr.user_id=u.id and u.disease_id in (1);
   QUERY  
PLAN
 
-
Aggregate  (cost=770.05..770.06 rows=1 width=4) (actual  
time=176.749..176.751 rows=1 loops=1)
   ->  Hash Join  (cost=89.43..737.50 rows=13020 width=4) (actual  
time=7.762..142.063 rows=13038 loops=1)

 Hash Cond: ("outer".symptom_id = "inner".id)
 ->  Hash Join  (cost=86.09..538.86 rows=13020 width=4)  
(actual time=7.277..89.293 rows=13038 loops=1)

   Hash Cond: ("outer".user_id = "inner".id)
   ->  Seq Scan on symptom_reports sr   
(cost=0.00..257.38 rows=13038 width=8) (actual time=0.003..30.499  
rows=13038 loops=1)
   ->  Hash  (cost=82.41..82.41 rows=1471 width=4)  
(actual time=7.261..7.261 rows=1471 loops=1)
 ->  Seq Scan on users u  (cost=0.00..82.41  
rows=1471 width=4) (actual time=0.006..4.133 rows=1471 loops=1)

   Filter: (disease_id = 1)
 ->  Hash  (cost=3.07..3.07 rows=107 width=4) (actual  
time=0.469..0.469 rows=107 loops=1)
   ->  Seq Scan on symptoms s  (cost=0.00..3.07 rows=107  
width=4) (actual time=0.007..0.247 rows=107 loops=1)

Total runtime: 176.842 ms
(12 rows)



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Insert performance

2007-03-06 Thread Carlos Moreno


1. If you're running 8.2 you can have multiple sets of values in an 
INSERT

http://www.postgresql.org/docs/8.2/static/sql-insert.html



Yeah, i'm running the 8.2.3 version ! i didn't know about multiple
inserts sets ! Thanks for the tip ;-)




No kidding --- thanks for the tip from me as well !!!

I didn't know this was possible (though I read in the docs that it is ANSI
SQL standard), and I'm also having a similar situation.

Two related questions:

1) What about atomicity?  Is it strictly equivalent to having multiple 
insert

statements inside a transaction?  (I assume it should be)

2) What about the issue with excessive locking for foreign keys when
inside a transaction?  Has that issue disappeared in 8.2?  And if not,
would it affect similarly in the case of multiple-row inserts?

In case you have no clue what I'm referring to:

Say that we have a table A, with one foreign key constraint to table
B --- last time I checked, there was an issue that whenever inserting
or updating table A (inside a transacion), postgres sets an exclusive
access lock on the referenced row on table B --- this is overkill, and
the correct thing to do would be to set a read-only lock  (so that
no-one else can *modify or remove* the referenced row while the
transaction has not been finished).

This caused unnecessary deadlock situations --- even though no-one
is modifying table B (which is enough to guarantee that concurrent
transactions would be ok), a second transacion would fail to set the
exclusive access lock, since someone already locked it.

My solution was to sort the insert statements by the referenced value
on table B.

(I hope the above explanation clarifies what I'm trying to say)

I wonder if I should still do the same if I go with a multiple-row
insert instead of multiple insert statements inside a transaction.

Thanks,

Carlos
--


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Insert performance

2007-03-06 Thread Csaba Nagy
I only know to answer your no. 2:
> 2) What about the issue with excessive locking for foreign keys when
> inside a transaction?  Has that issue disappeared in 8.2?  And if not,
> would it affect similarly in the case of multiple-row inserts?

The exclusive lock is gone already starting with 8.0 IIRC, a
non-exclusive lock on the parent row is used instead. Thing is that this
is still too strong ;-)

The proper lock would be one which only prevents modification of the
parent key, other updates would be safe on the same row.

In any case, the current behavior is much better than what was before.

Cheers,
Csaba.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM]

2007-03-06 Thread Tom Lane
Jeff Cole <[EMAIL PROTECTED]> writes:
> Hi Tom, thanks for the response.  Here are the pg_stats.  I think I  
> understand what the stats say, but I don't know what to conclude from  
> them.

OK, the symptom_id row claims there are only 80 distinct values of
symptom_id in symptom_reports.  This is a bit low (looks like the true
state of affairs is that all but 2 of the 108 entries of symptoms are
represented in symptom_reports), but it's not horridly off considering
that you're using the rather low default statistics_target.  What
happens is that the planner expects that on average only 80 rows of the
inner join will need to be scanned to find a match for a given symptoms.id,
and this makes the nestloop look cheap.  However, per your previous
EXPLAIN ANALYZE:

> ->  Nested Loop IN Join  (cost=149.05..586.26 rows=85 width=0)  (actual 
> time=54.517..3441.115 rows=106 loops=1)
>   Join Filter: ("outer".id = "inner".symptom_id)
>   ->  Seq Scan on symptoms  (cost=0.00..3.08 rows=108  width=4) 
> (actual time=0.007..0.273 rows=108 loops=1)
>   ->  Hash IN Join  (cost=149.05..603.90 rows=13074 width=4)  (actual 
> time=0.078..24.503 rows=3773 loops=108)


the *actual* average number of rows scanned is 3773.  I'm not sure why
this should be --- is it possible that the distribution of keys in
symptom_reports is wildly uneven?  This could happen if all of the
physically earlier rows in symptom_reports contain the same small set
of symptom_ids, but the stats don't seem to indicate such a skew.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Estimate the size of the SQL file generated by pg_dump utility

2007-03-06 Thread Richard Troy

On Tue, 6 Mar 2007, Tom Lane wrote:
> "Craig A. James" <[EMAIL PROTECTED]> writes:
> > Bill Moran wrote:
> >> I have no idea if that's legally binding or not, but I've talked to a few
> >> associates who have some experience in law, and they all argue that email
> >> disclaimers probably aren't legally binding anyway -- so the result is
> >> undefined.
>
> > No, it's not legally binding.  Agreements are only binding if both
> > parties agree, and someone sending you email has not consented to your
> > statement.
>
> To take this back to the PG problem: it's probably true that we can
> ignore disclaimers as far as receiving, redistributing, and archiving
> mail list submissions goes.  On the other hand, accepting a patch is
> another matter.

A published policy on patch submission making them fit whatever legal
model is desired would avoid any and all legal issues related to legalease
included with a submission. The would-be patcher's action of submission
can also count as acknowledgement of the actual agreement - your
agreement - if you've got the policy unambiguously and prominently
displayed...

HTH,
RT


-- 
Richard Troy, Chief Scientist
Science Tools Corporation
510-924-1363 or 202-747-1263
[EMAIL PROTECTED], http://ScienceTools.com/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-06 Thread Ron

At 10:25 AM 3/6/2007, Alex Deucher wrote:

On 3/5/07, Guido Neitzer <[EMAIL PROTECTED]> wrote:

On 05.03.2007, at 19:56, Alex Deucher wrote:

> Yes, I started setting that up this afternoon.  I'm going to test that
> tomorrow and post the results.

Good - that may or may not give some insight in the actual
bottleneck. You never know but it seems to be one of the easiest to
find out ...


Well, the SAN appears to be the limiting factor.  I set up the DB on
the local scsi discs (software RAID 1) and performance is excellent
(better than the old server).  Thanks for everyone's help.

Alex


What kind of SAN is it and how many + what kind of HDs are in it?
Assuming the answers are reasonable...

Profile the table IO pattern your workload generates and start 
allocating RAID sets to tables or groups of tables based on IO pattern.


For any table or group of tables that has a significant level of 
write IO, say >= ~25% of the IO mix, try RAID 5 or 6 first, but be 
prepared to go RAID 10 if performance is not acceptable.


Don't believe any of the standard "lore" regarding what tables to put 
where or what tables to give dedicated spindles to.

Profile, benchmark, and only then start allocating dedicated resources.
For instance, I've seen situations where putting pg_xlog on its own 
spindles was !not! the right thing to do.


Best Wishes,
Ron Peacetree


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] Automated test-suite for Postgres

2007-03-06 Thread Neelam Goyal

Hello,

Is anyone aware of some test-suite for Postgresql?

Thanks,
Neelam


Re: [PERFORM] Insert performance

2007-03-06 Thread Carlos Moreno

Csaba Nagy wrote:


I only know to answer your no. 2:
 


2) What about the issue with excessive locking for foreign keys when
inside a transaction?  Has that issue disappeared in 8.2?  And if not,
would it affect similarly in the case of multiple-row inserts?
   



The exclusive lock is gone already starting with 8.0 IIRC, a
non-exclusive lock on the parent row is used instead. Thing is that this
is still too strong ;-)

The proper lock would be one which only prevents modification of the
parent key, other updates would be safe on the same row.

In any case, the current behavior is much better than what was before.
 



*Much* better, I would say --- though you're still correct in that it is 
still

not the right thing to do.

In particular, with the previous approach. there was a serious performance
hit when concurrent transactions reference the same keys --- that is, after
having taken measures to avoid deadlocks, some transactions would have
to *wait* (for no good reason) until the other transaction is completed and
the exclusive-access lock is released.  For high-traffic databases this 
can be

a quite severe performance hit.  I'm glad it has been fixed, even if only
partially.

Thanks,

Carlos
--


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-06 Thread Alex Deucher

On 3/6/07, Ron <[EMAIL PROTECTED]> wrote:

At 10:25 AM 3/6/2007, Alex Deucher wrote:
>On 3/5/07, Guido Neitzer <[EMAIL PROTECTED]> wrote:
>>On 05.03.2007, at 19:56, Alex Deucher wrote:
>>
>> > Yes, I started setting that up this afternoon.  I'm going to test that
>> > tomorrow and post the results.
>>
>>Good - that may or may not give some insight in the actual
>>bottleneck. You never know but it seems to be one of the easiest to
>>find out ...
>
>Well, the SAN appears to be the limiting factor.  I set up the DB on
>the local scsi discs (software RAID 1) and performance is excellent
>(better than the old server).  Thanks for everyone's help.
>
>Alex

What kind of SAN is it and how many + what kind of HDs are in it?
Assuming the answers are reasonable...



It's a Hitachi WMS/Tagmastore.  105 hitachi SATA drives; 4 Gbps FC.


Profile the table IO pattern your workload generates and start
allocating RAID sets to tables or groups of tables based on IO pattern.

For any table or group of tables that has a significant level of
write IO, say >= ~25% of the IO mix, try RAID 5 or 6 first, but be
prepared to go RAID 10 if performance is not acceptable.



Right now it's designed for max capacity: big RAID 5 groups.  I expect
I'll probably need RAID 10 for decent performance.


Don't believe any of the standard "lore" regarding what tables to put
where or what tables to give dedicated spindles to.
Profile, benchmark, and only then start allocating dedicated resources.
For instance, I've seen situations where putting pg_xlog on its own
spindles was !not! the right thing to do.



Right.  Thanks for the advice.  I'll post my results when I get around
to testing some new SAN configurations.

Alex

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Automated test-suite for Postgres

2007-03-06 Thread Heikki Linnakangas

Neelam Goyal wrote:

Is anyone aware of some test-suite for Postgresql?


What do you want to test? PostgreSQL itself or some application using 
it? Do you want to do performance testing or functional regression 
testing, perhaps?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM]

2007-03-06 Thread Jeff Cole


On Mar 6, 2007, at 11:40 AM, Tom Lane wrote:


the *actual* average number of rows scanned is 3773.  I'm not sure why
this should be --- is it possible that the distribution of keys in
symptom_reports is wildly uneven?  This could happen if all of the
physically earlier rows in symptom_reports contain the same small set
of symptom_ids, but the stats don't seem to indicate such a skew.


Hi Tom, you are correct, the distribution is uneven... In the 13k  
symptom_reports rows, there are 105 distinct symptom_ids.   But the  
first 8k symptom_reports rows only have 10 distinct symptom_ids.
Could this cause the problem and would there be anything I could do  
to address it?


Thanks for all your help, I appreciate it.

-Jeff

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[PERFORM] Any advantage to integer vs stored date w. timestamp

2007-03-06 Thread Zoolin Lin
Hi,

I have database with a huge amount of data so i'm trying to make it as fast as 
possible and minimize space.

One thing i've done is join on a prepopulated date lookup table to prevent a 
bunch of rows with duplicate date columns. Without this I'd have about 2500 
rows per hour with the exact same date w. timestamp in them.

My question is, with postgres do I really gain anything by this, or should I 
just use the date w. timestamp column on the primary table and ditch the join 
on the date_id table.

Primary table is all integers like:

date id | num1 | num2 | num3 | num4 | num5 | num6 | num7 | num 8
-
primary key is on date to num->6 columns

date_id lookup table:

This table is prepopulated with the date values that will be used.

date_id | date w timestamp

1 | 2007-2-15 Midnight
2 | 2007-2-15 1 am
3 | 2007-2-15 2 am  etc for 24 hours each day


Each day 60k records are added to a monthly table structured as above, about 
2500 per hour.

thank you for your advice

 
-
It's here! Your new message!
Get new email alerts with the free Yahoo! Toolbar.

[PERFORM] problem with wrong query planning and ineffective statistics

2007-03-06 Thread Paolo Negri

I think I have an issue with the planning of this query that sometimes
runs really slow.

this is the output of the EXPLAIN ANALYZE in the SLOW case

Sort  (cost=4105.54..4105.54 rows=2 width=28) (actual
time=11404.225..11404.401 rows=265 loops=1)
  Sort Key: table1.fdeventfromdate, table2.fdsurname, table2.fdtitle
  ->  Nested Loop Left Join  (cost=192.34..4105.53 rows=2 width=28)
(actual time=0.770..11402.185 rows=265 loops=1)
Join Filter: ("inner".table2_id = "outer".id)
->  Nested Loop Left Join  (cost=192.34..878.40 rows=1
width=28) (actual time=0.750..6.878 rows=96 loops=1)
  Join Filter: ("inner".id = "outer".table1_id)
  ->  Nested Loop Left Join  (cost=192.34..872.82 rows=1
width=24) (actual time=0.551..5.453 rows=96 loops=1)
->  Nested Loop Left Join  (cost=192.34..866.86
rows=1 width=28) (actual time=0.534..4.370 rows=96 loops=1)
  ->  Nested Loop  (cost=192.34..862.46
rows=1 width=28) (actual time=0.515..3.100 rows=96 loops=1)
->  Bitmap Heap Scan on table2
(cost=192.34..509.00 rows=96 width=24) (actual time=0.488..1.140
rows=96 loops=1)
  Recheck Cond: ((id = ...
[CUT]

this query takes 11000 milliseconds

this is the output of the EXPLAIN ANALYZE in the FAST case

Sort  (cost=8946.80..8946.82 rows=10 width=28) (actual
time=286.969..287.208 rows=363 loops=1)
  Sort Key: table1.fdeventfromdate, table2.fdsurname, table2.fdtitle
  ->  Merge Left Join  (cost=8617.46..8946.63 rows=10 width=28)
(actual time=232.330..284.750 rows=363 loops=1)
Merge Cond: ("outer".id = "inner".table2_id)
->  Sort  (cost=946.68..946.69 rows=4 width=28) (actual
time=4.505..4.568 rows=101 loops=1)
  Sort Key: table2.id
  ->  Hash Left Join  (cost=208.33..946.64 rows=4
width=28) (actual time=0.786..4.279 rows=101 loops=1)
Hash Cond: ("outer".table1_id = "inner".id)
->  Nested Loop Left Join  (cost=202.35..940.64
rows=4 width=24) (actual time=0.719..4.011 rows=101 loops=1)
  ->  Nested Loop Left Join
(cost=202.35..916.76 rows=4 width=28) (actual time=0.701..3.165
rows=101 loops=1)
->  Nested Loop  (cost=202.35..899.50
rows=4 width=28) (actual time=0.676..2.284 rows=101 loops=1)
  ->  Bitmap Heap Scan on table2
(cost=202.35..534.18 rows=101 width=24) (actual time=0.644..1.028
rows=101 loops=1)
Recheck Cond: ((id = ...
[CUT]

this time the query takes 290 milliseconds

As you can see the forecast about the returned rows are completely off
in both case but the forecast of 10 rows in the second case is enough
to plan the query in a more clever way.
I tried to increase the default_statistics_target from 10 to 100 and
after I relaunched analyze on the DB on the test machine but this
hasn't improved in any way the situation.
The problem is, the distribution of the data across the tables joined
in this query is quite uneven and I can see the avg_width of the
relations keys is really not a good representative value.
Is there something I can do to improve this situation?

Thanks

Paolo

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] compact flash disks?

2007-03-06 Thread James Mansion
I see that one can now get compact flash to SATA connectors.

If I were to use a filesystem with noatime etc and little non-sql traffic,
does the physical update pattern tend to have hot sectors that will tend to
wear out CF?

I'm wondering about a RAID5 with data on CF drives and RAID1 for teh WAL on
a fast SATA or SAS drive pair.  I'm thhinking that this would tend to have
good performance because the seek time for the data is very low, even if the
actual write speed can be slower than state of the art.  2GB CF isn't so
pricey any more.

Just wondering.

James

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.7/711 - Release Date: 05/03/2007
09:41


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] compact flash disks?

2007-03-06 Thread Merlin Moncure

On 3/7/07, James Mansion <[EMAIL PROTECTED]> wrote:

I see that one can now get compact flash to SATA connectors.

If I were to use a filesystem with noatime etc and little non-sql traffic,
does the physical update pattern tend to have hot sectors that will tend to
wear out CF?

I'm wondering about a RAID5 with data on CF drives and RAID1 for teh WAL on
a fast SATA or SAS drive pair.  I'm thhinking that this would tend to have
good performance because the seek time for the data is very low, even if the
actual write speed can be slower than state of the art.  2GB CF isn't so
pricey any more.

Just wondering.


me too.  I think if you were going to do this I would configure as
raid 0.  Sequential performance might be a problem, and traditional
hard drive failure is not.  I think some of the better flash drives
spread out the writes so that life is maximized.

It's still probably cheaper buying a better motherboard and stuffing
more memory in it, and a good raid controller.

merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] When the Record Got Updated.

2007-03-06 Thread Gauri Kanekar

Hi List,

Can i find out the timestamp when last a record from a table got updated.
Do any of the pg system tables store this info.

--
Regards
Gauri


Re: [PERFORM] When the Record Got Updated.

2007-03-06 Thread A. Kretschmer
am  Wed, dem 07.03.2007, um 12:13:55 +0530 mailte Gauri Kanekar folgendes:
> Hi List,
>  
> Can i find out the timestamp when last a record from a table got updated.
> Do any of the pg system tables store this info.

No, impossible. But you can write a TRIGGER for such tasks.


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly