Re: [PERFORM] COPY vs INSERT
On Wed, 4 May 2005, Mischa Sandberg wrote: > Quoting Kris Jurka <[EMAIL PROTECTED]>: > > > Not true. A client may send any number of Bind/Execute messages on > > a prepared statement before a Sync message. > Hunh. Interesting optimization in the JDBC driver. I gather it is > sending a string of (;)-separated inserts. No, it uses the V3 protocol and a prepared statement and uses Bind/Execute, as I mentioned. > Sounds like efficient-but-risky stuff we did for ODBC drivers at Simba > ... gets interesting when one of the insert statements in the middle > fails. When running inside a transaction (as you really want to do anyway when bulk loading) it is well defined, it is a little odd for auto commit mode though. In autocommit mode the transaction boundary is at the Sync message, not the individual Execute messages, so you will get some rollback on error. The JDBC spec is poorly defined in this area, so we can get away with this. > Good to know. Hope that the batch size is parametric, given that > you can have inserts with rather large strings bound to 'text' columns > in PG --- harder to identify BLOBs when talking to PG, than when talking > to MSSQL/Oracle/Sybase. The batch size is not a parameter and I don't think it needs to be. The issue of filling both sides of network buffers and deadlocking only needs to be avoided on one side. The response to an insert request is small and not dependent on the size of the data sent, so we can send as much as we want as long as the server doesn't send much back to us. Kris Jurka ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] Bad choice of query plan from PG 7.3.6 to PG 7.3.9 part 1
Hi I'm currently experiencing problems with long query execution times. What I believe makes these problems particularly interesting is the difference in execution plans between our test server running PostGreSQL 7.3.6 and our production server running PostGreSQL 7.3.9. The test server is an upgraded "home machine", a Pentium 4 with 1GB of memory and IDE disk. The production server is a dual CPU XEON Pentium 4 with 2GB memory and SCSI disks. One should expect the production server to be faster, but appearently not as the outlined query plans below shows. My questions can be summoned up to: 1) How come the query plans between the 2 servers are different? 2) How come the production server in general estimates the cost of the query plans so horribly wrong? (ie. it chooses a bad query plan where as the test server chooses a good plan) 3) In Query 2, how come the production server refuses the use its indexes (subcat_uq and aff_price_uq, both unique indexes) where as the test server determines that the indexes are the way to go 4) In Query 3, how come the test server refuses to use its index (sct2lang_uq) and the production server uses it? And why is the test server still faster eventhough it makes a sequential scan of a table with 8.5k records in? Please note, a VACUUM ANALYSE is run on the production server once a day (used to be once an hour but it seemed to make no difference), however there are generally no writes to the tables used in the queries. If anyone could shed some light on these issues I would truly appreciate it. Cheers Jona PS. Please refer to part 2 for the other queries and query plans Query 1: EXPLAIN ANALYZE SELECT DISTINCT StatConTrans_Tbl.id, Code_Tbl.sysnm AS code, PriceCat_Tbl.amount AS price, Country_Tbl.currency, CreditsCat_Tbl.amount AS credits, Info_Tbl.title, Info_Tbl.description FROM (SCT2SubCatType_Tbl INNER JOIN SCT2Lang_Tbl ON SCT2SubCatType_Tbl.sctid = SCT2Lang_Tbl.sctid INNER JOIN Language_Tbl ON SCT2Lang_Tbl.langid = Language_Tbl.id AND Language_Tbl.sysnm = UPPER('us') AND Language_Tbl.enabled = true INNER JOIN Info_Tbl ON SCT2SubCatType_Tbl.sctid = Info_Tbl.sctid AND Language_Tbl.id = Info_Tbl.langid INNER JOIN SubCatType_Tbl ON SCT2SubCatType_Tbl.subcattpid = SubCatType_Tbl.id AND SubCatType_Tbl.enabled = true INNER JOIN CatType_Tbl ON SubCatType_Tbl.cattpid = CatType_Tbl.id AND CatType_Tbl.enabled = true INNER JOIN SuperCatType_Tbl ON CatType_Tbl.spcattpid = SuperCatType_Tbl.id AND SuperCatType_Tbl.enabled = true INNER JOIN StatConTrans_Tbl ON SCT2SubCatType_Tbl.sctid = StatConTrans_Tbl.id AND StatConTrans_Tbl.enabled = true INNER JOIN Price_Tbl ON StatConTrans_Tbl.id = Price_Tbl.sctid AND Price_Tbl.affid = 8 INNER JOIN PriceCat_Tbl ON Price_Tbl.prccatid = PriceCat_Tbl.id AND PriceCat_Tbl.enabled = true INNER JOIN Country_Tbl ON PriceCat_Tbl.cntid = Country_Tbl.id AND Country_Tbl.enabled = true INNER JOIN CreditsCat_Tbl ON Price_Tbl.crdcatid = CreditsCat_Tbl.id AND CreditsCat_Tbl.enabled = true INNER JOIN StatCon_Tbl ON StatConTrans_Tbl.id = StatCon_Tbl.sctid AND StatCon_Tbl.ctpid = 1 INNER JOIN Code_Tbl ON SuperCatType_Tbl.id = Code_Tbl.spcattpid AND Code_Tbl.affid = 8 AND Code_Tbl.cdtpid = 1) WHERE SCT2SubCatType_Tbl.subcattpid = 79 ORDER BY StatConTrans_Tbl.id DESC LIMIT 8 OFFSET 0 Plan on PostGre 7.3.6 on Red Hat Linux 3.2.3-39 "Limit (cost=178.59..178.61 rows=1 width=330) (actual time=22.77..28.51 rows=4 loops=1)" " -> Unique (cost=178.59..178.61 rows=1 width=330) (actual time=22.77..28.50 rows=4 loops=1)" "-> Sort (cost=178.59..178.60 rows=1 width=330) (actual time=22.76..22.85 rows=156 loops=1)" " Sort Key: statcontrans_tbl.id, code_tbl.sysnm, pricecat_tbl.amount, country_tbl.currency, creditscat_tbl.amount, info_tbl.title, info_tbl.description" " -> Hash Join (cost=171.19..178.58 rows=1 width=330) (actual time=3.39..6.55 rows=156 loops=1)" "Hash Cond: ("outer".cntid = "inner".id)" "-> Nested Loop (cost=170.13..177.51 rows=1 width=312) (actual time=3.27..5.75 rows=156 loops=1)" " Join Filter: ("inner".sctid = "outer".sctid)" " -> Hash Join (cost=170.13..171.48 rows=1 width=308) (actual time=3.12..3.26 rows=4 loops=1)" "Hash Cond: ("outer".crdcatid = "inner".id)" "-> Hash Join (cost=169.03..170.38 rows=1 width=300) (actual time=3.00..3.11 rows=4 loops=1)" " Hash Cond: ("outer".spcattpid = "inner".spcattpid)" " -> Hash Join (cost=167.22..168.56 rows=1 width=253) (actual time=2.88..2.97 rows=4 loops=1)" "Hash Cond: ("outer".id = "inner".prccatid)" "-> Seq Scan
Re: [PERFORM] Bad choice of query plan from PG 7.3.6 to PG 7.3.9 part 1
Jona <[EMAIL PROTECTED]> writes: > I'm currently experiencing problems with long query execution times. > What I believe makes these problems particularly interesting is the > difference in execution plans between our test server running PostGreSQL > 7.3.6 and our production server running PostGreSQL 7.3.9. > The test server is an upgraded "home machine", a Pentium 4 with 1GB of > memory and IDE disk. > The production server is a dual CPU XEON Pentium 4 with 2GB memory and > SCSI disks. > One should expect the production server to be faster, but appearently > not as the outlined query plans below shows. I think the plans are fine; it looks to me like the production server has serious table-bloat or index-bloat problems, probably because of inadequate vacuuming. For instance compare these entries: -> Index Scan using ctp_statcon on statcon_tbl (cost=0.00..6.01 rows=1 width=4) (actual time=0.05..0.31 rows=39 loops=4) Index Cond: ((statcon_tbl.sctid = "outer".sctid) AND (statcon_tbl.ctpid = 1)) -> Index Scan using ctp_statcon on statcon_tbl (cost=0.00..20.40 rows=5 width=4) (actual time=27.97..171.84 rows=39 loops=4) Index Cond: ((statcon_tbl.sctid = "outer".sctid) AND (statcon_tbl.ctpid = 1)) Appears to be exactly the same task ... but the test server spent 1.24 msec total while the production server spent 687.36 msec total. That's more than half of your problem right there. Some of the other scans seem a lot slower on the production machine too. > 1) How come the query plans between the 2 servers are different? The production server's rowcount estimates are pretty good, the test server's are not. How long since you vacuumed/analyzed the test server? It'd be interesting to see the output of "vacuum verbose statcon_tbl" on both servers ... regards, tom lane PS: if you post any more query plans, please try to use software that doesn't mangle the formatting so horribly ... ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Table stats
> Should there not be at least one Index Scan showing in the stats? not if there was a table scan ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] COPY vs INSERT
Christopher Petrilli wrote: On 5/4/05, Mischa Sandberg <[EMAIL PROTECTED]> wrote: Quoting David Roussel <[EMAIL PROTECTED]>: COPY invokes all the same logic as INSERT on the server side (rowexclusive locking, transaction log, updating indexes, rules). The difference is that all the rows are inserted as a single transaction. This reduces the number of fsync's on the xlog, which may be a limiting factor for you. You'll want to crank WAL_BUFFERS and CHECKPOINT_SEGMENTS to match, though. One of my streams has 6K records; I run with WB=1000, CS=128. So what's the difference between a COPY and a batch of INSERT statements. Also, surely, fsyncs only occur at the end of a transaction, no need to fsync before a commit has been issued, right? Sorry, I was comparing granularities the other way araound. As far as xlog is concerned, a COPY is ALWAYS one big txn, no matter how many putline commands you use to feed the copy. With inserts, you can choose whether to commit every row, every nth row, etc. Copy makes better use of the TCP connection for transmission. COPY uses the TCP connection like a one-way pipe. INSERT is like an RPC: the sender has to wait until the insert's return status roundtrips. I have found even greater performance increases by using COPY FROM not COPY FROM STDIN. This allows the backend process to directly read the file, rather than shoving it over a pipe (thereby potentially hitting the CPU multiple times). My experience is that this is anywhere from 5-10x faster than INSERT statements on the whole, and sometimes 200x. Chris Unfortunately, COPY FROM '' can only be done by a superuser. If you that option then that is great. If not... -- Kind Regards, Keith ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] COPY vs INSERT
On Wed, May 04, 2005 at 10:22:56PM -0400, Tom Lane wrote: > Also, there is a whole lot of one-time-per-statement overhead that can > be amortized across many rows instead of only one. Stuff like opening > the target table, looking up the per-column I/O conversion functions, > identifying trigger functions if any, yadda yadda. It's not *that* > expensive, but compared to an operation as small as inserting a single > row, it's significant. Has thought been given to supporting inserting multiple rows in a single insert? DB2 supported: INSERT INTO table VALUES( (1,2,3), (4,5,6), (7,8,9) ); I'm not sure how standard that is or if other databases support it. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly