[PERFORM] Materialize Subplan and push into inner index conditions
Is it possible to have the planner consider the second plan instead of the first? admpostgres4=> explain analyze select * from users where id in (select user_id from user2user_group where user_group_id = 769694); QUERY PLAN --- Hash IN Join (cost=4.04..2302.05 rows=4 width=78) (actual time=50.381..200.985 rows=2 loops=1) Hash Cond: ("outer".id = "inner".user_id) -> Append (cost=0.00..1931.68 rows=77568 width=78) (actual time=0.004..154.629 rows=76413 loops=1) -> Seq Scan on users (cost=0.00..1024.88 rows=44588 width=78) (actual time=0.004..36.220 rows=43433 loops=1) -> Seq Scan on person_user users (cost=0.00..906.80 rows=32980 width=78) (actual time=0.005..38.120 rows=32980 loops=1) -> Hash (cost=4.04..4.04 rows=2 width=4) (actual time=0.020..0.020 rows=2 loops=1) -> Index Scan using user2user_group_user_group_id_idx on user2user_group (cost=0.00..4.04 rows=2 width=4) (actual time=0.011..0.014 rows=2 loops=1) Index Cond: (user_group_id = 769694) Total runtime: 201.070 ms (9 rows) admpostgres4=> select user_id from user2user_group where user_group_id = 769694; user_id - 766541 766552 (2 rows) admpostgres4=> explain analyze select * from users where id in (766541, 766552); QUERY PLAN - Result (cost=4.02..33.48 rows=9 width=78) (actual time=0.055..0.087 rows=2 loops=1) -> Append (cost=4.02..33.48 rows=9 width=78) (actual time=0.051..0.082 rows=2 loops=1) -> Bitmap Heap Scan on users (cost=4.02..18.10 rows=5 width=78) (actual time=0.051..0.053 rows=2 loops=1) Recheck Cond: ((id = 766541) OR (id = 766552)) -> BitmapOr (cost=4.02..4.02 rows=5 width=0) (actual time=0.045..0.045 rows=0 loops=1) -> Bitmap Index Scan on users_id_idx (cost=0.00..2.01 rows=2 width=0) (actual time=0.034..0.034 rows=1 loops=1) Index Cond: (id = 766541) -> Bitmap Index Scan on users_id_idx (cost=0.00..2.01 rows=2 width=0) (actual time=0.008..0.008 rows=1 loops=1) Index Cond: (id = 766552) -> Bitmap Heap Scan on person_user users (cost=4.02..15.37 rows=4 width=78) (actual time=0.025..0.025 rows=0 loops=1) Recheck Cond: ((id = 766541) OR (id = 766552)) -> BitmapOr (cost=4.02..4.02 rows=4 width=0) (actual time=0.023..0.023 rows=0 loops=1) -> Bitmap Index Scan on person_user_id_idx (cost=0.00..2.01 rows=2 width=0) (actual time=0.017..0.017 rows=0 loops=1) Index Cond: (id = 766541) -> Bitmap Index Scan on person_user_id_idx (cost=0.00..2.01 rows=2 width=0) (actual time=0.004..0.004 rows=0 loops=1) Index Cond: (id = 766552) Total runtime: 0.177 ms (17 rows) admpostgres4=> admpostgres4=> \d users; Table "adm.users" Column |Type | Modifiers --+-+- id | integer | not null classid | integer | not null revision | integer | not null rev_start| timestamp without time zone | rev_end | timestamp without time zone | rev_timestamp| timestamp without time zone | not null rev_state| integer | not null default 10 name | character varying | password | character varying | password_expires | timestamp without time zone | password_period | integer | Indexes: "users_pkey" primary key, btree (revision) "users_uidx" unique, btree (revision) "users_id_idx" btree (id) "users_name_idx" btree (rev_state, rev_end, name) "users_rev_end_idx" btree (rev_end) "users_rev_idx" btree (rev_state, rev_end) "users_rev_start_idx" btree (rev_start) "users_rev_state_idx" btree (rev_state) Inherits: revision admpostgres4=>\d person_user; Table "adm.person_user" Column |Type | Modifiers --+-+- id | integer | not null classid | integer | not null revision | integer | not null rev_start| timestamp without time zone | rev_end | timestamp without time zone | rev_timestamp| timestamp without time zone | not null rev_state| integer | not null default 10
Re: [PERFORM] Materialize Subplan and push into inner index conditions
Jens-Wolfhard Schicke <[EMAIL PROTECTED]> writes: > Is it possible to have the planner consider the second plan instead of the > first? At the moment, only if you get rid of the inheritance. The planner's not very smart at all when faced with joining inheritance trees. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] improving write performance for logging application
I have questions about how to improve the write performance of PostgreSQL for logging data from a real-time simulation. We found that MySQL 4.1.3 could log about 1480 objects/second using MyISAM tables or about 1225 objects/second using InnoDB tables, but PostgreSQL 8.0.3 could log only about 540 objects/second. (test system: quad-Itanium2, 8GB memory, SCSI RAID, GigE connection from simulation server, nothing running except system processes and database system under test) We also found that we could improve MySQL performance significantly using MySQL's "INSERT" command extension allowing multiple value-list tuples in a single command; the rate for MyISAM tables improved to about 2600 objects/second. PostgreSQL doesn't support that language extension. Using the COPY command instead of INSERT might help, but since rows are being generated on the fly, I don't see how to use COPY without running a separate process that reads rows from the application and uses COPY to write to the database. The application currently has two processes: the simulation and a data collector that reads events from the sim (queued in shared memory) and writes them as rows to the database, buffering as needed to avoid lost data during periods of high activity. To use COPY I think we would have to split our data collector into two processes communicating via a pipe. Query performance is not an issue: we found that when suitable indexes are added PostgreSQL is fast enough on the kinds of queries our users make. The crux is writing rows to the database fast enough to keep up with the simulation. Are there general guidelines for tuning the PostgreSQL server for this kind of application? The suggestions I've found include disabling fsync (done), increasing the value of wal_buffers, and moving the WAL to a different disk, but these aren't likely to produce the 3x improvement that we need. On the client side I've found only two suggestions: disable autocommit and use COPY instead of INSERT. I think I've effectively disabled autocommit by batching up to several hundred INSERT commands in each PQexec() call, and it isn’t clear that COPY is worth the effort in our application. Thanks. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] improving write performance for logging application
Steve Eckmann <[EMAIL PROTECTED]> writes: > We also found that we could improve MySQL performance significantly > using MySQL's "INSERT" command extension allowing multiple value-list > tuples in a single command; the rate for MyISAM tables improved to > about 2600 objects/second. PostgreSQL doesn't support that language > extension. Using the COPY command instead of INSERT might help, but > since rows are being generated on the fly, I don't see how to use COPY > without running a separate process that reads rows from the > application and uses COPY to write to the database. Can you conveniently alter your application to batch INSERT commands into transactions? Ie BEGIN; INSERT ...; ... maybe 100 or so inserts ... COMMIT; BEGIN; ... lather, rinse, repeat ... This cuts down the transactional overhead quite a bit. A downside is that you lose multiple rows if any INSERT fails, but then the same would be true of multiple VALUES lists per INSERT. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] improving write performance for logging application
On Tue, Jan 03, 2006 at 04:44:28PM -0700, Steve Eckmann wrote: > Are there general guidelines for tuning the PostgreSQL server for this kind > of application? The suggestions I've found include disabling fsync (done), Are you sure you really want this? The results could be catastrophic in case of a crash. > On the client side I've found only two suggestions: disable autocommit and > use COPY instead of INSERT. I think I've effectively disabled autocommit by > batching up to several hundred INSERT commands in each PQexec() call, and > it isn’t clear that COPY is worth the effort in our application. I'm a bit confused here: How can you batch multiple INSERTs into large statements for MySQL, but not batch multiple INSERTs into COPY statements for PostgreSQL? Anyhow, putting it all inside one transaction (or a few) is likely to help quite a lot, but of course less when you have fsync=false. Bunding multiple statements in each PQexec() call won't really give you that; you'll have to tell the database so explicitly. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] improving write performance for logging application
On Tue, 3 Jan 2006, Tom Lane wrote: > Steve Eckmann <[EMAIL PROTECTED]> writes: > > We also found that we could improve MySQL performance significantly > > using MySQL's "INSERT" command extension allowing multiple value-list > > tuples in a single command; the rate for MyISAM tables improved to > > about 2600 objects/second. PostgreSQL doesn't support that language > > extension. Using the COPY command instead of INSERT might help, but > > since rows are being generated on the fly, I don't see how to use COPY > > without running a separate process that reads rows from the > > application and uses COPY to write to the database. > > Can you conveniently alter your application to batch INSERT commands > into transactions? Ie > > BEGIN; > INSERT ...; > ... maybe 100 or so inserts ... > COMMIT; > BEGIN; > ... lather, rinse, repeat ... > > This cuts down the transactional overhead quite a bit. A downside is > that you lose multiple rows if any INSERT fails, but then the same would > be true of multiple VALUES lists per INSERT. Steve, you mentioned that you data collector buffers the data before sending it to the database, modify it so that each time it goes to send things to the database you send all the data that's in the buffer as a single transaction. I am working on useing postgres to deal with log data and wrote a simple perl script that read in the log files a line at a time, and then wrote them 1000 at a time to the database. On a dual Opteron 240 box with 2G of ram 1x 15krpm SCSI drive (and a untuned postgress install with the compile time defaults) I was getting 5000-8000 lines/sec (I think this was with fsync disabled, but I don't remember for sure). and postgres was complaining that it was overrunning it's log sizes (which limits the speed as it then has to pause to flush the logs) the key thing is to send multiple lines with one transaction as tom shows above. David Lang ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Avoiding cartesian product
I have a table which stores cumulative valuesI would like to display/chart the deltas between successive data collections If my primary key only increments by 1, I could write a simple query select b.gc_minor - a.gc_minor, b.gc_major - a.gc_major from jam_trace_sys a, jam_trace_sys b where a.trace_id = 22 and b.trace_id = a.trace_id and b.seq_no = a.seq_no + 1 order by a.seq_no; However the difference in sequence number is variable.So (in Oracle) I used to extract the next seq_no using a correlated sub-query select b.gc_minor - a.gc_minor, b.gc_major - a.gc_majorfrom jam_trace_sys a, jam_trace_sys bwhere a.trace_id = 22and (b.trace_id, b.seq_no) =(select a.trace_id, min(c.seq_no) from jam_trace_sys cwhere c.trace_id = a.trace_id and c.seq_no > a.seq_no) order by a.seq_no; For every row in A, The correlated sub-query from C will executeWith an appropriate index, it will just descend the index Btreego one row to the right and return that row (min > :value)and join to table B SELECT STATEMENT SORT ORDER BY TABLE ACCESS BY INDEX ROWID JAM_TRACE_SYS B NESTED LOOPS TABLE ACCESS BY INDEX ROWID JAM_TRACE_SYS A INDEX RANGE SCAN JAM_TRACE_SYS_N1 A INDEX RANGE SCAN JAM_TRACE_SYS_N1 B SORT AGGREGATE INDEX RANGE SCAN JAM_TRACE_SYS_N1 C In postgreSQL A and B are doing a cartesian productthen C gets executed for every row in this cartesian productand most of the extra rows get thrown out.Is there any way to force an execution plan like above where the correlated subquery runs before going to B.The table is small right now, but it will grow to have millions of rows QUERY PLAN--- Sort (cost=124911.81..124944.84 rows=13213 width=20) (actual time=13096.754..13097.053 rows=149 loops=1) Sort Key: a.seq_no -> Nested Loop (cost=4.34..124007.40 rows=13213 width=20) (actual time=1948.300..13096.329 rows=149 loops=1) Join Filter: (subplan) -> Seq Scan on jam_trace_sys b (cost=0.00..3.75 rows=175 width=16) (actual time=0.005..0.534 rows=175 loops=1) -> Materialize (cost=4.34..5.85 rows=151 width=16) (actual time=0.002..0.324 rows=150 loops=175) -> Seq Scan on jam_trace_sys a (cost=0.00..4.19 rows=151 width=16) (actual time=0.022..0.687 rows=150 loops=1) Filter: (trace_id = 22) SubPlan -> Aggregate (cost=4.67..4.67 rows=1 width=4) (actual time=0.486..0.488 rows=1 loops=26250) -> Seq Scan on jam_trace_sys c (cost=0.00..4.62 rows=15 width=4) (actual time=0.058..0.311 rows=74 loops=26250) Filter: ((trace_id = $0) AND (seq_no > $1)) Total runtime: 13097.557 ms(13 rows) pglnx01=> \d jam_trace_sys Table "public.jam_trace_sys" Column | Type | Modifiers-+-+--- trace_id | integer | seq_no | integer | cpu_utilization | integer | gc_minor | integer | gc_major | integer | heap_used | integer |Indexes: "jam_trace_sys_n1" btree (trace_id, seq_no) pglnx01=> select count(*) from jam_trace_Sys ; count--- 175(1 row) pglnx01=> select trace_id, count(*) from jam_trace_sys group by trace_id ; trace_id | count--+--- 15 | 2 18 | 21 22 | 150 16 | 2(4 rows)