Re: [PERFORM] Insert performance
joël Winteregg wrote: Hi Richard, Here is my problem. With some heavy insert into a simple BD (one table, no indexes) i can't get better perf than 8000 inserts/sec. I'm testing it using a simple C software which use libpq and which use: - Insert prepared statement (to avoid too many request parsing on the server) - transaction of 10 inserts Are each of the INSERTs in their own transaction? No, as said above transactions are made of 10 inserts... Hmm - I read that as just meaning "inserted 10 rows". You might find that smaller batches provide peak performance. If so, you'll be limited by the speed of the disk the WAL is running on. That means you have two main options: 1. Have multiple connections inserting simultaneously. Yes, you're right. That what i have been testing and what provide the best performance ! I saw that postgresql frontend was using a lot of CPU and not both of them (i'm using a pentium D, dual core). To the opposit, the postmaster process use not much resources. Using several client, both CPU are used and i saw an increase of performance (about 18000 inserts/sec). So i think my bottle neck is more the CPU speed than the disk speed, what do you think ? Well, I think it's fair to say it's not disk. Let's see - the original figure was 8000 inserts/sec, which is 0.125ms per insert. That sounds plausible to me for a round-trip to process a simple command - are you running the client app on the same machine, or is it over the network? Two other things to bear in mind: 1. If you're running 8.2 you can have multiple sets of values in an INSERT http://www.postgresql.org/docs/8.2/static/sql-insert.html 2. You can do a COPY from libpq - is it really not possible? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Insert performance
joël Winteregg wrote: No, as said above transactions are made of 10 inserts... Hmm - I read that as just meaning "inserted 10 rows". You might find that smaller batches provide peak performance. Ahh ok ;-) sorry for my bad english... (yeah, i have been testing several transaction size 1, 2 and 10) Not your bad English, my poor reading :-) If so, you'll be limited by the speed of the disk the WAL is running on. That means you have two main options: 1. Have multiple connections inserting simultaneously. Yes, you're right. That what i have been testing and what provide the best performance ! I saw that postgresql frontend was using a lot of CPU and not both of them (i'm using a pentium D, dual core). To the opposit, the postmaster process use not much resources. Using several client, both CPU are used and i saw an increase of performance (about 18000 inserts/sec). So i think my bottle neck is more the CPU speed than the disk speed, what do you think ? Well, I think it's fair to say it's not disk. Let's see - the original figure was 8000 inserts/sec, which is 0.125ms per insert. That sounds plausible to me for a round-trip to process a simple command - are you running the client app on the same machine, or is it over the network? I did both test. On the local machine (using UNIX sockets) i can reach 18000 insert/sec with 10 clients and prepared statements. The same test using clients on the remote machine provide me 13000 inserts/sec. OK, so we know what the overhead for network connections is. Now, with multiple client (multi-threaded inserts) my both CPU are quite well used (both arround 90%) so i maybe think that disk speeds are now my bottleneck. What do you think ? or maybe i will need a better CPU ? Two other things to bear in mind: 1. If you're running 8.2 you can have multiple sets of values in an INSERT http://www.postgresql.org/docs/8.2/static/sql-insert.html Yeah, i'm running the 8.2.3 version ! i didn't know about multiple inserts sets ! Thanks for the tip ;-) Ah-ha! Give it a go, it's designed for this sort of situation. Not sure it'll manage thousands of value clauses, but working up from 10 perhaps. I've not tested it for performance, so I'd be interesting in knowing how it compares to your other results. 2. You can do a COPY from libpq - is it really not possible? Not really but i have been testing it and inserts are flying (about 10 inserts/sec) !! What's the problem with the COPY? Could you COPY into one table then insert from that to your target table? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Insert performance
* Richard Huxton [070306 12:22]: > >>2. You can do a COPY from libpq - is it really not possible? > >> > >Not really but i have been testing it and inserts are flying (about > >10 inserts/sec) !! > > What's the problem with the COPY? Could you COPY into one table then insert > from that to your target table? Well, there are some issues. First your client needs to support it. E.g. psycopg2 supports only some specific CSV formatting in it's methods. (plus I had sometimes random psycopg2 crashes, but guarding against these is cheap compared to the speedup from COPY versus INSERT) Plus you need to be sure that your data will apply cleanly (which in my app was not the case), or you need to code a fallback that localizes the row that doesn't work. And the worst thing is, that it ignores RULES on the tables, which sucks if you use them ;) (e.g. table partitioning). Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Insert performance
Andreas Kostyrka wrote: * Richard Huxton [070306 12:22]: 2. You can do a COPY from libpq - is it really not possible? Not really but i have been testing it and inserts are flying (about 10 inserts/sec) !! What's the problem with the COPY? Could you COPY into one table then insert from that to your target table? Well, there are some issues. First your client needs to support it. E.g. psycopg2 supports only some specific CSV formatting in it's methods. (plus I had sometimes random psycopg2 crashes, but guarding against these is cheap compared to the speedup from COPY versus INSERT) Plus you need to be sure that your data will apply cleanly (which in my app was not the case), or you need to code a fallback that localizes the row that doesn't work. And the worst thing is, that it ignores RULES on the tables, which sucks if you use them ;) (e.g. table partitioning). Ah, but two things deal with these issues: 1. Joel is using libpq 2. COPY into a holding table, tidy data and INSERT ... SELECT -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Insert performance
* Richard Huxton [070306 13:47]: > Andreas Kostyrka wrote: > >* Richard Huxton [070306 12:22]: > 2. You can do a COPY from libpq - is it really not possible? > > >>>Not really but i have been testing it and inserts are flying (about > >>>10 inserts/sec) !! > >>What's the problem with the COPY? Could you COPY into one table then insert > >>from that to your target table? > >Well, there are some issues. First your client needs to support it. > >E.g. psycopg2 supports only some specific CSV formatting in it's > >methods. (plus I had sometimes random psycopg2 crashes, but guarding against > >these is cheap compared to the speedup from COPY versus INSERT) > >Plus you need to be sure that your data will apply cleanly (which in > >my app was not the case), or you need to code a fallback that > >localizes the row that doesn't work. > >And the worst thing is, that it ignores RULES on the tables, which > >sucks if you use them ;) (e.g. table partitioning). > > Ah, but two things deal with these issues: > 1. Joel is using libpq > 2. COPY into a holding table, tidy data and INSERT ... SELECT Clearly COPY is the way for bulk loading data, BUT you asked, so I wanted to point out some problems and brittle points with COPY. (and the copy into the holding table doesn't solve completly the problem with the dirty inconsistent data) Andreas ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Estimate the size of the SQL file generated by pg_dump utility
In response to Bruce Momjian <[EMAIL PROTECTED]>: > Bricklen Anderson wrote: > > Bruce Momjian wrote: > > > Ravindran G-TLS,Chennai. wrote: > > >> Note: Please bear with us for the disclaimer because it is automated in > > >> the exchange server. > > >> Regards, > > >> Ravi > > > > > > FYI, we are getting closer to rejecting any email with such a > > > disclaimer, or emailing you back every time saying we are ignoring the > > > disclaimer. > > > > I think this issue cropped up a year or two ago, and one of the > > suggestions was for the offender to simply put a link back to their > > disclaimer at the foot of their email, rather than that uber-verbose > > message. > > Right. The problem is that most of the posters have no control over > their footers --- it is added by their email software. I'm curious, what problem does the disclaimer cause? I wrote the following TOS for my personal system: https://www.potentialtech.com/cms/node/9 Excerpt of the relevant part: "If you send me email, you are granting me the unrestricted right to use the contents of that email however I see fit, unless otherwise agreed in writing beforehand. You have no rights to the privacy of any email that you send me. If I feel the need, I will forward emails to authorities or make their contents publicly available. By sending me email you consent to this policy and agree that it overrides any disclaimers or policies that may exist elsewhere." I have no idea if that's legally binding or not, but I've talked to a few associates who have some experience in law, and they all argue that email disclaimers probably aren't legally binding anyway -- so the result is undefined. Don't know if this addresses the issue or confuses it ... ? -- Bill Moran Collaborative Fusion Inc. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Estimate the size of the SQL file generated by pg_dump utility
Bill Moran wrote: I'm curious, what problem does the disclaimer cause? I wrote the following TOS for my personal system: https://www.potentialtech.com/cms/node/9 Excerpt of the relevant part: "If you send me email, you are granting me the unrestricted right to use the contents of that email however I see fit, unless otherwise agreed in writing beforehand. You have no rights to the privacy of any email that you send me. If I feel the need, I will forward emails to authorities or make their contents publicly available. By sending me email you consent to this policy and agree that it overrides any disclaimers or policies that may exist elsewhere." I have no idea if that's legally binding or not, but I've talked to a few associates who have some experience in law, and they all argue that email disclaimers probably aren't legally binding anyway -- so the result is undefined. No, it's not legally binding. Agreements are only binding if both parties agree, and someone sending you email has not consented to your statement. If I send you something with a copyright mark, you'd better respect it unless you have a signed agreement granting you rights. Federal law always wins. Disclaimers are bad for two reasons. First, they're powerless. Just because Acme Corp. attaches a disclaimer doesn't mean they've absolved themselves of responsibility for the actions of their employees. Second, they're insulting to the employees. It's a big red flag saying, "We, Acme Corp., hire clowns we don't trust, and THIS person may be one of them!" Craig ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Estimate the size of the SQL file generated by pg_dump utility
> I'm curious, what problem does the disclaimer cause? > > I wrote the following TOS for my personal system: > https://www.potentialtech.com/cms/node/9 > Excerpt of the relevant part: > I have no idea if that's legally binding or not, but I've talked to a few > associates who have some experience in law, and they all argue that email > disclaimers probably aren't legally binding anyway -- so the result is > undefined. No, it's not legally binding. Agreements are only binding if both parties agree, and someone sending you email has not consented to your statement. If I send you something with a copyright mark, you'd better respect it unless you have a signed agreement granting you rights. Federal law always wins. Disclaimers are bad for two reasons. First, they're powerless. Just because Acme Corp. attaches a disclaimer doesn't mean they've absolved themselves of responsibility for the actions of their employees. Second, they're insulting to the employees. It's a big red flag saying, "We, Acme Corp., hire clowns we don't trust, and THIS person may be one of them!" Dear sirs, this is off-topic at best. Pls. discontinue this thread. regards Claus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] strange performance regression between 7.4 and 8.1
On 3/5/07, Guido Neitzer <[EMAIL PROTECTED]> wrote: On 05.03.2007, at 19:56, Alex Deucher wrote: > Yes, I started setting that up this afternoon. I'm going to test that > tomorrow and post the results. Good - that may or may not give some insight in the actual bottleneck. You never know but it seems to be one of the easiest to find out ... Well, the SAN appears to be the limiting factor. I set up the DB on the local scsi discs (software RAID 1) and performance is excellent (better than the old server). Thanks for everyone's help. Alex ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Estimate the size of the SQL file generated by pg_dump utility
"Craig A. James" <[EMAIL PROTECTED]> writes: > Bill Moran wrote: >> I have no idea if that's legally binding or not, but I've talked to a few >> associates who have some experience in law, and they all argue that email >> disclaimers probably aren't legally binding anyway -- so the result is >> undefined. > No, it's not legally binding. Agreements are only binding if both > parties agree, and someone sending you email has not consented to your > statement. To take this back to the PG problem: it's probably true that we can ignore disclaimers as far as receiving, redistributing, and archiving mail list submissions goes. On the other hand, accepting a patch is another matter. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM]
On Mar 5, 2007, at 8:54 PM, Tom Lane wrote: Hm, the cost for the upper nestloop is way less than you would expect given that the HASH IN join is going to have to be repeated 100+ times. I think this must be due to a very low "join_in_selectivity" estimate but I'm not sure why you are getting that, especially seeing that the rowcount estimates aren't far off. Can you show us the pg_stats rows for symptoms.id and symptom_reports.symptom_id? Hi Tom, thanks for the response. Here are the pg_stats. I think I understand what the stats say, but I don't know what to conclude from them. plm_stage=# select * from pg_stats where tablename = 'symptoms' and attname = 'id'; schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation +---+-+---+--- ++--+--- +-+- public | symptoms | id | 0 | 4 | -1 | | | {1,11,24,34,46,57,71,85,95,106,117} |0.451606 plm_stage=# select * from pg_stats where tablename = 'symptom_reports' and attname = 'symptom_id'; schemaname |tablename| attname | null_frac | avg_width | n_distinct |most_common_vals | most_common_freqs | histogram_bounds | correlation +-++---+--- ++ +--- ---+-+- public | symptom_reports | symptom_id | 0 | 4 | 80 | {3,2,4,1,5,8,9,7,10,6} | {0.094,0.093,0.093,0.092,0.091,0.090,0.087,0.084 ,0.084,0.08} | {12,18,24,30,38,44,51,57,91,91,114} | 0.0955925 And Ismo, I followed your suggestion to re-write the SQL more cleanly, and you are right it was faster, so that is certainly a solution. Although I am still curious why my original query slowed down after the vacuum analyze. In any case, here is the explain analyze from the new query. Compare that to the 3441.452 ms of the old query after the analyze (and 134.045 ms before the analyze): plm_stage=# explain analyze SELECT count(distinct s.id) AS count_all FROM symptoms s ,symptom_reports sr,users u WHERE s.id=sr.symptom_id and sr.user_id=u.id and u.disease_id in (1); QUERY PLAN - Aggregate (cost=770.05..770.06 rows=1 width=4) (actual time=176.749..176.751 rows=1 loops=1) -> Hash Join (cost=89.43..737.50 rows=13020 width=4) (actual time=7.762..142.063 rows=13038 loops=1) Hash Cond: ("outer".symptom_id = "inner".id) -> Hash Join (cost=86.09..538.86 rows=13020 width=4) (actual time=7.277..89.293 rows=13038 loops=1) Hash Cond: ("outer".user_id = "inner".id) -> Seq Scan on symptom_reports sr (cost=0.00..257.38 rows=13038 width=8) (actual time=0.003..30.499 rows=13038 loops=1) -> Hash (cost=82.41..82.41 rows=1471 width=4) (actual time=7.261..7.261 rows=1471 loops=1) -> Seq Scan on users u (cost=0.00..82.41 rows=1471 width=4) (actual time=0.006..4.133 rows=1471 loops=1) Filter: (disease_id = 1) -> Hash (cost=3.07..3.07 rows=107 width=4) (actual time=0.469..0.469 rows=107 loops=1) -> Seq Scan on symptoms s (cost=0.00..3.07 rows=107 width=4) (actual time=0.007..0.247 rows=107 loops=1) Total runtime: 176.842 ms (12 rows) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Insert performance
1. If you're running 8.2 you can have multiple sets of values in an INSERT http://www.postgresql.org/docs/8.2/static/sql-insert.html Yeah, i'm running the 8.2.3 version ! i didn't know about multiple inserts sets ! Thanks for the tip ;-) No kidding --- thanks for the tip from me as well !!! I didn't know this was possible (though I read in the docs that it is ANSI SQL standard), and I'm also having a similar situation. Two related questions: 1) What about atomicity? Is it strictly equivalent to having multiple insert statements inside a transaction? (I assume it should be) 2) What about the issue with excessive locking for foreign keys when inside a transaction? Has that issue disappeared in 8.2? And if not, would it affect similarly in the case of multiple-row inserts? In case you have no clue what I'm referring to: Say that we have a table A, with one foreign key constraint to table B --- last time I checked, there was an issue that whenever inserting or updating table A (inside a transacion), postgres sets an exclusive access lock on the referenced row on table B --- this is overkill, and the correct thing to do would be to set a read-only lock (so that no-one else can *modify or remove* the referenced row while the transaction has not been finished). This caused unnecessary deadlock situations --- even though no-one is modifying table B (which is enough to guarantee that concurrent transactions would be ok), a second transacion would fail to set the exclusive access lock, since someone already locked it. My solution was to sort the insert statements by the referenced value on table B. (I hope the above explanation clarifies what I'm trying to say) I wonder if I should still do the same if I go with a multiple-row insert instead of multiple insert statements inside a transaction. Thanks, Carlos -- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Insert performance
I only know to answer your no. 2: > 2) What about the issue with excessive locking for foreign keys when > inside a transaction? Has that issue disappeared in 8.2? And if not, > would it affect similarly in the case of multiple-row inserts? The exclusive lock is gone already starting with 8.0 IIRC, a non-exclusive lock on the parent row is used instead. Thing is that this is still too strong ;-) The proper lock would be one which only prevents modification of the parent key, other updates would be safe on the same row. In any case, the current behavior is much better than what was before. Cheers, Csaba. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM]
Jeff Cole <[EMAIL PROTECTED]> writes: > Hi Tom, thanks for the response. Here are the pg_stats. I think I > understand what the stats say, but I don't know what to conclude from > them. OK, the symptom_id row claims there are only 80 distinct values of symptom_id in symptom_reports. This is a bit low (looks like the true state of affairs is that all but 2 of the 108 entries of symptoms are represented in symptom_reports), but it's not horridly off considering that you're using the rather low default statistics_target. What happens is that the planner expects that on average only 80 rows of the inner join will need to be scanned to find a match for a given symptoms.id, and this makes the nestloop look cheap. However, per your previous EXPLAIN ANALYZE: > -> Nested Loop IN Join (cost=149.05..586.26 rows=85 width=0) (actual > time=54.517..3441.115 rows=106 loops=1) > Join Filter: ("outer".id = "inner".symptom_id) > -> Seq Scan on symptoms (cost=0.00..3.08 rows=108 width=4) > (actual time=0.007..0.273 rows=108 loops=1) > -> Hash IN Join (cost=149.05..603.90 rows=13074 width=4) (actual > time=0.078..24.503 rows=3773 loops=108) the *actual* average number of rows scanned is 3773. I'm not sure why this should be --- is it possible that the distribution of keys in symptom_reports is wildly uneven? This could happen if all of the physically earlier rows in symptom_reports contain the same small set of symptom_ids, but the stats don't seem to indicate such a skew. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Estimate the size of the SQL file generated by pg_dump utility
On Tue, 6 Mar 2007, Tom Lane wrote: > "Craig A. James" <[EMAIL PROTECTED]> writes: > > Bill Moran wrote: > >> I have no idea if that's legally binding or not, but I've talked to a few > >> associates who have some experience in law, and they all argue that email > >> disclaimers probably aren't legally binding anyway -- so the result is > >> undefined. > > > No, it's not legally binding. Agreements are only binding if both > > parties agree, and someone sending you email has not consented to your > > statement. > > To take this back to the PG problem: it's probably true that we can > ignore disclaimers as far as receiving, redistributing, and archiving > mail list submissions goes. On the other hand, accepting a patch is > another matter. A published policy on patch submission making them fit whatever legal model is desired would avoid any and all legal issues related to legalease included with a submission. The would-be patcher's action of submission can also count as acknowledgement of the actual agreement - your agreement - if you've got the policy unambiguously and prominently displayed... HTH, RT -- Richard Troy, Chief Scientist Science Tools Corporation 510-924-1363 or 202-747-1263 [EMAIL PROTECTED], http://ScienceTools.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] strange performance regression between 7.4 and 8.1
At 10:25 AM 3/6/2007, Alex Deucher wrote: On 3/5/07, Guido Neitzer <[EMAIL PROTECTED]> wrote: On 05.03.2007, at 19:56, Alex Deucher wrote: > Yes, I started setting that up this afternoon. I'm going to test that > tomorrow and post the results. Good - that may or may not give some insight in the actual bottleneck. You never know but it seems to be one of the easiest to find out ... Well, the SAN appears to be the limiting factor. I set up the DB on the local scsi discs (software RAID 1) and performance is excellent (better than the old server). Thanks for everyone's help. Alex What kind of SAN is it and how many + what kind of HDs are in it? Assuming the answers are reasonable... Profile the table IO pattern your workload generates and start allocating RAID sets to tables or groups of tables based on IO pattern. For any table or group of tables that has a significant level of write IO, say >= ~25% of the IO mix, try RAID 5 or 6 first, but be prepared to go RAID 10 if performance is not acceptable. Don't believe any of the standard "lore" regarding what tables to put where or what tables to give dedicated spindles to. Profile, benchmark, and only then start allocating dedicated resources. For instance, I've seen situations where putting pg_xlog on its own spindles was !not! the right thing to do. Best Wishes, Ron Peacetree ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Automated test-suite for Postgres
Hello, Is anyone aware of some test-suite for Postgresql? Thanks, Neelam
Re: [PERFORM] Insert performance
Csaba Nagy wrote: I only know to answer your no. 2: 2) What about the issue with excessive locking for foreign keys when inside a transaction? Has that issue disappeared in 8.2? And if not, would it affect similarly in the case of multiple-row inserts? The exclusive lock is gone already starting with 8.0 IIRC, a non-exclusive lock on the parent row is used instead. Thing is that this is still too strong ;-) The proper lock would be one which only prevents modification of the parent key, other updates would be safe on the same row. In any case, the current behavior is much better than what was before. *Much* better, I would say --- though you're still correct in that it is still not the right thing to do. In particular, with the previous approach. there was a serious performance hit when concurrent transactions reference the same keys --- that is, after having taken measures to avoid deadlocks, some transactions would have to *wait* (for no good reason) until the other transaction is completed and the exclusive-access lock is released. For high-traffic databases this can be a quite severe performance hit. I'm glad it has been fixed, even if only partially. Thanks, Carlos -- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] strange performance regression between 7.4 and 8.1
On 3/6/07, Ron <[EMAIL PROTECTED]> wrote: At 10:25 AM 3/6/2007, Alex Deucher wrote: >On 3/5/07, Guido Neitzer <[EMAIL PROTECTED]> wrote: >>On 05.03.2007, at 19:56, Alex Deucher wrote: >> >> > Yes, I started setting that up this afternoon. I'm going to test that >> > tomorrow and post the results. >> >>Good - that may or may not give some insight in the actual >>bottleneck. You never know but it seems to be one of the easiest to >>find out ... > >Well, the SAN appears to be the limiting factor. I set up the DB on >the local scsi discs (software RAID 1) and performance is excellent >(better than the old server). Thanks for everyone's help. > >Alex What kind of SAN is it and how many + what kind of HDs are in it? Assuming the answers are reasonable... It's a Hitachi WMS/Tagmastore. 105 hitachi SATA drives; 4 Gbps FC. Profile the table IO pattern your workload generates and start allocating RAID sets to tables or groups of tables based on IO pattern. For any table or group of tables that has a significant level of write IO, say >= ~25% of the IO mix, try RAID 5 or 6 first, but be prepared to go RAID 10 if performance is not acceptable. Right now it's designed for max capacity: big RAID 5 groups. I expect I'll probably need RAID 10 for decent performance. Don't believe any of the standard "lore" regarding what tables to put where or what tables to give dedicated spindles to. Profile, benchmark, and only then start allocating dedicated resources. For instance, I've seen situations where putting pg_xlog on its own spindles was !not! the right thing to do. Right. Thanks for the advice. I'll post my results when I get around to testing some new SAN configurations. Alex ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Automated test-suite for Postgres
Neelam Goyal wrote: Is anyone aware of some test-suite for Postgresql? What do you want to test? PostgreSQL itself or some application using it? Do you want to do performance testing or functional regression testing, perhaps? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM]
On Mar 6, 2007, at 11:40 AM, Tom Lane wrote: the *actual* average number of rows scanned is 3773. I'm not sure why this should be --- is it possible that the distribution of keys in symptom_reports is wildly uneven? This could happen if all of the physically earlier rows in symptom_reports contain the same small set of symptom_ids, but the stats don't seem to indicate such a skew. Hi Tom, you are correct, the distribution is uneven... In the 13k symptom_reports rows, there are 105 distinct symptom_ids. But the first 8k symptom_reports rows only have 10 distinct symptom_ids. Could this cause the problem and would there be anything I could do to address it? Thanks for all your help, I appreciate it. -Jeff ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Any advantage to integer vs stored date w. timestamp
Hi, I have database with a huge amount of data so i'm trying to make it as fast as possible and minimize space. One thing i've done is join on a prepopulated date lookup table to prevent a bunch of rows with duplicate date columns. Without this I'd have about 2500 rows per hour with the exact same date w. timestamp in them. My question is, with postgres do I really gain anything by this, or should I just use the date w. timestamp column on the primary table and ditch the join on the date_id table. Primary table is all integers like: date id | num1 | num2 | num3 | num4 | num5 | num6 | num7 | num 8 - primary key is on date to num->6 columns date_id lookup table: This table is prepopulated with the date values that will be used. date_id | date w timestamp 1 | 2007-2-15 Midnight 2 | 2007-2-15 1 am 3 | 2007-2-15 2 am etc for 24 hours each day Each day 60k records are added to a monthly table structured as above, about 2500 per hour. thank you for your advice - It's here! Your new message! Get new email alerts with the free Yahoo! Toolbar.
[PERFORM] problem with wrong query planning and ineffective statistics
I think I have an issue with the planning of this query that sometimes runs really slow. this is the output of the EXPLAIN ANALYZE in the SLOW case Sort (cost=4105.54..4105.54 rows=2 width=28) (actual time=11404.225..11404.401 rows=265 loops=1) Sort Key: table1.fdeventfromdate, table2.fdsurname, table2.fdtitle -> Nested Loop Left Join (cost=192.34..4105.53 rows=2 width=28) (actual time=0.770..11402.185 rows=265 loops=1) Join Filter: ("inner".table2_id = "outer".id) -> Nested Loop Left Join (cost=192.34..878.40 rows=1 width=28) (actual time=0.750..6.878 rows=96 loops=1) Join Filter: ("inner".id = "outer".table1_id) -> Nested Loop Left Join (cost=192.34..872.82 rows=1 width=24) (actual time=0.551..5.453 rows=96 loops=1) -> Nested Loop Left Join (cost=192.34..866.86 rows=1 width=28) (actual time=0.534..4.370 rows=96 loops=1) -> Nested Loop (cost=192.34..862.46 rows=1 width=28) (actual time=0.515..3.100 rows=96 loops=1) -> Bitmap Heap Scan on table2 (cost=192.34..509.00 rows=96 width=24) (actual time=0.488..1.140 rows=96 loops=1) Recheck Cond: ((id = ... [CUT] this query takes 11000 milliseconds this is the output of the EXPLAIN ANALYZE in the FAST case Sort (cost=8946.80..8946.82 rows=10 width=28) (actual time=286.969..287.208 rows=363 loops=1) Sort Key: table1.fdeventfromdate, table2.fdsurname, table2.fdtitle -> Merge Left Join (cost=8617.46..8946.63 rows=10 width=28) (actual time=232.330..284.750 rows=363 loops=1) Merge Cond: ("outer".id = "inner".table2_id) -> Sort (cost=946.68..946.69 rows=4 width=28) (actual time=4.505..4.568 rows=101 loops=1) Sort Key: table2.id -> Hash Left Join (cost=208.33..946.64 rows=4 width=28) (actual time=0.786..4.279 rows=101 loops=1) Hash Cond: ("outer".table1_id = "inner".id) -> Nested Loop Left Join (cost=202.35..940.64 rows=4 width=24) (actual time=0.719..4.011 rows=101 loops=1) -> Nested Loop Left Join (cost=202.35..916.76 rows=4 width=28) (actual time=0.701..3.165 rows=101 loops=1) -> Nested Loop (cost=202.35..899.50 rows=4 width=28) (actual time=0.676..2.284 rows=101 loops=1) -> Bitmap Heap Scan on table2 (cost=202.35..534.18 rows=101 width=24) (actual time=0.644..1.028 rows=101 loops=1) Recheck Cond: ((id = ... [CUT] this time the query takes 290 milliseconds As you can see the forecast about the returned rows are completely off in both case but the forecast of 10 rows in the second case is enough to plan the query in a more clever way. I tried to increase the default_statistics_target from 10 to 100 and after I relaunched analyze on the DB on the test machine but this hasn't improved in any way the situation. The problem is, the distribution of the data across the tables joined in this query is quite uneven and I can see the avg_width of the relations keys is really not a good representative value. Is there something I can do to improve this situation? Thanks Paolo ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] compact flash disks?
I see that one can now get compact flash to SATA connectors. If I were to use a filesystem with noatime etc and little non-sql traffic, does the physical update pattern tend to have hot sectors that will tend to wear out CF? I'm wondering about a RAID5 with data on CF drives and RAID1 for teh WAL on a fast SATA or SAS drive pair. I'm thhinking that this would tend to have good performance because the seek time for the data is very low, even if the actual write speed can be slower than state of the art. 2GB CF isn't so pricey any more. Just wondering. James -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.446 / Virus Database: 268.18.7/711 - Release Date: 05/03/2007 09:41 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] compact flash disks?
On 3/7/07, James Mansion <[EMAIL PROTECTED]> wrote: I see that one can now get compact flash to SATA connectors. If I were to use a filesystem with noatime etc and little non-sql traffic, does the physical update pattern tend to have hot sectors that will tend to wear out CF? I'm wondering about a RAID5 with data on CF drives and RAID1 for teh WAL on a fast SATA or SAS drive pair. I'm thhinking that this would tend to have good performance because the seek time for the data is very low, even if the actual write speed can be slower than state of the art. 2GB CF isn't so pricey any more. Just wondering. me too. I think if you were going to do this I would configure as raid 0. Sequential performance might be a problem, and traditional hard drive failure is not. I think some of the better flash drives spread out the writes so that life is maximized. It's still probably cheaper buying a better motherboard and stuffing more memory in it, and a good raid controller. merlin ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] When the Record Got Updated.
Hi List, Can i find out the timestamp when last a record from a table got updated. Do any of the pg system tables store this info. -- Regards Gauri
Re: [PERFORM] When the Record Got Updated.
am Wed, dem 07.03.2007, um 12:13:55 +0530 mailte Gauri Kanekar folgendes: > Hi List, > > Can i find out the timestamp when last a record from a table got updated. > Do any of the pg system tables store this info. No, impossible. But you can write a TRIGGER for such tasks. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly