[PERFORM] DELETE with filter on ctid

2007-04-09 Thread Spiegelberg, Greg
We have a query which generates a small set of rows (~1,000) which are
to be used in a DELETE on the same table.  The problem we have is that
we need to join on 5 different columns and it takes far too long.  I
have a solution but I'm not sure it's the right one.  Instead of joining
on 5 columns in the DELETE the join uses the ctid column.
 
BEGIN;
CREATE INDEX gregs_table_ctid_idx ON gregs_table(ctid);
DELETE FROM gregs_table gt
   USING (SELECT ctid FROM gregs_table WHERE ...) as s
   WHERE gt.ctid=s.ctid;
DROP INDEX gregs_table_ctid_idx;
COMMIT;
 
The difference to me is a 20+ minute to a ~5 second transaction.  The
table is loaded using COPY, never INSERT, never UPDATE'd.  COPY, SELECT
and DELETE is its life.  PostgreSQL 8.2.1 on RedHat ES 4.0 is the target
platform.
 
Any possible issues with using ctid in the DELETE and transaction?  I
understand ctid is "useless" in the long run as the documentation points
out but for the short term and within a transaction it seems to work
well.
 
Thoughts?
 
Greg
 
 
--
 Greg Spiegelberg
 [EMAIL PROTECTED]  
 614.318.4314, office
 614.431.8388, fax
 ISOdx Product Development Manager
 Cranel, Inc.
 
 


Re: [PERFORM] DELETE with filter on ctid

2007-04-10 Thread Spiegelberg, Greg
loops=562)
   Index Cond: (dd.ctid = d.ctid)
   Filter: ((NOT persist) AND (dtype_id = 3))
   ->  Index Scan using points_pkey on points ss  (cost=0.00..0.28
rows=1 width=8) (actual time=0.005..0.007 rows=1 loops=562)
 Index Cond: ((ss.id < 2) AND (ss.id = dd.point_id))
 Filter: ((systems_id = 2) AND (NOT permpoint))
 Total runtime: 2641.820 ms
(29 rows)
Time: 2652.940 ms

cranel=# DROP INDEX data_id_table_ctid_idx;
DROP INDEX
Time: 33.653 ms

cranel=# DELETE FROM sid2.data_id_table AS dd WHERE dd.point_id=2 AND
dd.dtype_id=3 AND dd.deleted AND NOT dd.persist;
DELETE 0
Time: 0.960 ms

cranel=# COMMIT;
Time: 20.500 ms




 

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Monday, April 09, 2007 4:55 PM
To: Spiegelberg, Greg
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] DELETE with filter on ctid 

"Spiegelberg, Greg" <[EMAIL PROTECTED]> writes:
> We have a query which generates a small set of rows (~1,000) which are
> to be used in a DELETE on the same table.  The problem we have is that
> we need to join on 5 different columns and it takes far too long.  I
> have a solution but I'm not sure it's the right one.  Instead of
joining
> on 5 columns in the DELETE the join uses the ctid column.

> BEGIN;
> CREATE INDEX gregs_table_ctid_idx ON gregs_table(ctid);
> DELETE FROM gregs_table gt
>USING (SELECT ctid FROM gregs_table WHERE ...) as s
>WHERE gt.ctid=s.ctid;
> DROP INDEX gregs_table_ctid_idx;
> COMMIT;

Forget the index, it's useless here (hint: ctid is a physical address).
I'm wondering though why you don't just transpose the subquery's WHERE
condition into the DELETE's WHERE?  Or is this example oversimplified?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] DELETE with filter on ctid

2007-04-10 Thread Spiegelberg, Greg
Craig,

I'm not using a TEMP TABLE in this DELETE however I have tried an
ANALYZE prior to the DELETE but it hardly makes a dent in the time.

Please look at the other follow-up email I just sent for full details.

Greg
 

-Original Message-
From: Craig A. James [mailto:[EMAIL PROTECTED] 
Sent: Monday, April 09, 2007 5:58 PM
To: Spiegelberg, Greg
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] DELETE with filter on ctid

Spiegelberg, Greg wrote:
> We have a query which generates a small set of rows (~1,000) which are

> to be used in a DELETE on the same table.  The problem we have is that

> we need to join on 5 different columns and it takes far too long.

You may have encountered the same problem I did:  You *must* run ANALYZE
on a temporary table before you use in another query.  It's surprising
that this is true even for very small tables (a few hundred to a few
thousand rows), but it is.  I had a case where I created a "scratch"
table like yours, and the before/after ANALYZE performance was the
difference between 30 seconds and a few milliseconds for the same query.

Craig

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Intel C/C++ Compiler Tests

2006-03-22 Thread Spiegelberg, Greg
All,

Has anyone tested PostgreSQL 8.1.x compiled with Intel's Linux C/C++
compiler?

Greg

--
 Greg Spiegelberg
 [EMAIL PROTECTED]
 ISOdx Product Development Manager
 Cranel, Inc.
 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] setting up foreign keys

2006-08-11 Thread Spiegelberg, Greg
Sort of on topic, how many foreign keys in a single table is good v.
bad?  I realize it's relative to the tables the FK's reference so here's
an example:

Table A: 300 rows
Table B: 15,000,000 rows
Table C: 100,000 rows
Table E: 38 rows
Table F: 9 rows
Table G: is partitioned on the FK from Table A and has a FK column for
each of the above tables

I'm in the process of normalizing the database and have a schema like
this in mind.  Works wonderfully for SELECT's but haven't gotten the
data import process down just yet so I haven't had a chance to put it
through it's paces.  Depending on the performance of INSERT, UPDATE, and
COPY I may drop the FK constraints since my app could enforce the FK
checks.

TIA.

Greg
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Chris
> Sent: Thursday, August 10, 2006 6:36 PM
> To: Merlin Moncure
> Cc: Sue Fitt; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] setting up foreign keys
> 
> Merlin Moncure wrote:
> > On 8/10/06, Chris <[EMAIL PROTECTED]> wrote:
> >> Sue Fitt wrote:
> >> > Thanks Chris and Chris, you've solved it.
> >> >
> >> > I had a gui open that connects to the database. It was doing 
> >> > nothing (and not preventing me adding to or altering 
> headwords_core 
> >> > via psql), but having closed it the table is instantly 
> created. Weird.
> >> >
> >> > BTW, referencing the same column twice is deliberate, it's a 
> >> > cross-reference.
> >>
> >> The same column and the same table?
> >>
> >> Same column different table I could understand but not the same 
> >> column & table ;)
> > 
> > create table color(color text);
> > 
> > create table person(eye_color text references color(color), 
> hair_color 
> > text references color(color));
> 
> lol. Good point :)
> 
> *back to the hidey hole!*
> 
> --
> Postgresql & php tutorials
> http://www.designmagick.com/
> 
> ---(end of 
> broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] 
> so that your
>message can get through to the mailing list cleanly
> 

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] RAID 0 not as fast as expected

2006-09-15 Thread Spiegelberg, Greg
That's an all PCI-X box which makes sense.  There are 6 SATA controllers
in that little beastie also.  You can always count on Sun to provide
over engineered boxes.

 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Joshua D. Drake
> Sent: Friday, September 15, 2006 12:01 AM
> To: Luke Lonergan
> Cc: Craig A. James; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] RAID 0 not as fast as expected
> 
> Luke Lonergan wrote:
> > Josh,
> > 
> > On 9/14/06 8:47 PM, "Joshua D. Drake" <[EMAIL PROTECTED]> wrote:
> > 
> >>> I've obtained 1,950 MB/s using Linux software RAID on SATA drives.
> >> With what? :)
> > 
> > Sun X4500 (aka Thumper) running stock RedHat 4.3 (actually 
> CentOS 4.3) 
> > with XFS and the linux md driver without lvm.  Here is a 
> summary of the results:
> > 
> 
> 
> Good god!
> 
> >  
> >  Read Test  
> >  RAID Level Max Readahead (KB) RAID Chunksize Max Readahead 
> on Disks 
> > (KB) Max Time (s)  Read Bandwidth (MB/s)  0 65536 64 256 16.689  
> > 1,917.43  0 4096 64 256 21.269  1,504.54  0 65536 256 256 17.967  
> > 1,781.04  0 2816 256 256 18.835  1,698.96  0 65536 1024 256 18.538  
> > 1,726.18  0 65536 64 512 18.295  1,749.11  0 65536 64 256 18.931  
> > 1,690.35  0 65536 64 256 18.873  1,695.54  0 64768 64 256 18.545  
> > 1,725.53  0 131172 64 256 18.548  1,725.25  0 131172 64 
> 65536 19.046  
> > 1,680.14  0 131172 64 524288 18.125  1,765.52  0 131172 64 1048576 
> > 18.701  1,711.14
> >  5 2560 64 256 39.933  801.34
> >  5 16777216 64 256 37.76  847.46
> >  5 524288 64 256 53.497  598.16
> >  5 65536 32 256 38.472  831.77
> >  5 65536 32 256 38.004  842.02
> >  5 65536 32 256 37.884  844.68
> >  5 2560 16 256 41.39  773.13
> >  5 65536 16 256 48.902  654.37
> >  10 65536 64 256 83.256  384.36
> >  1+0 65536 64 256 19.394  1,649.99
> >  1+0 65536 64 256 19.047  1,680.05
> >  1+0 65536 64 256 19.195  1,667.10
> >  1+0 65536 64 256 18.806  1,701.58
> >  1+0 65536 64 256 18.848  1,697.79
> >  1+0 65536 64 256 18.371  1,741.88
> >  1+0 65536 64 256 21.446  1,492.12
> >  1+0 65536 64 256 20.254  1,579.93
> > 
> > 
> 
> 
> -- 
> 
> === The PostgreSQL Company: Command Prompt, Inc. ===
> Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
> Providing the most comprehensive  PostgreSQL solutions since 1997
>   http://www.commandprompt.com/
> 
> 
> 
> ---(end of 
> broadcast)---
> TIP 2: Don't 'kill -9' the postmaster
> 

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Best COPY Performance

2006-10-25 Thread Spiegelberg, Greg
> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Craig A. James
> Sent: Wednesday, October 25, 2006 12:52 PM
> To: Jim C. Nasby
> Cc: Worky Workerson; Merlin Moncure; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Best COPY Performance
> 
> Jim C. Nasby wrote:
> > Wait... so you're using perl to copy data between two tables? And 
> > using a cursor to boot? I can't think of any way that could be more 
> > inefficient...
> > 
> > What's wrong with a plain old INSERT INTO ... SELECT? Or if 
> you really 
> > need to break it into multiple transaction blocks, at least don't 
> > shuffle the data from the database into perl and then back into the 
> > database; do an INSERT INTO ... SELECT with that same where clause.
> 
> The data are on two different computers, and I do processing 
> of the data as it passes through the application.  Otherwise, 
> the INSERT INTO ... SELECT is my first choice.

Would dblink() help in any way?

Greg



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Best COPY Performance

2006-10-30 Thread Spiegelberg, Greg
> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Luke Lonergan
> Sent: Saturday, October 28, 2006 12:07 AM
> To: Worky Workerson; Merlin Moncure
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Best COPY Performance
> 
> Worky,
> 
> On 10/27/06 8:47 PM, "Worky Workerson" 
> <[EMAIL PROTECTED]> wrote:
> 
> > Are you saying that I should be able to issue multiple COPY 
> commands 
> > because my I/O wait is low?  I was under the impression 
> that I am I/O 
> > bound, so multiple simeoultaneous loads would have a detrimental 
> > effect ...
> 
> ... 
> I agree with Merlin that you can speed things up by breaking 
> the file up.
> Alternately you can use the OSS Bizgres java loader, which 
> lets you specify the number of I/O threads with the "-n" 
> option on a single file.

As a result of this thread, and b/c I've tried this in the past but
never had much success at speeding the process up, I attempted just that
here except via 2 psql CLI's with access to the local file.  1.1M rows
of data varying in width from 40 to 200 characters COPY'd to a table
with only one text column, no keys, indexes, &c took about 15 seconds to
load. ~73K rows/second.

I broke that file into 2 files each of 550K rows and performed 2
simultaneous COPY's after dropping the table, recreating, issuing a sync
on the system to be sure, &c and nearly every time both COPY's finish in
12 seconds.  About a 20% gain to ~91K rows/second.

Admittedly, this was a pretty rough test but a 20% savings, if it can be
put into production, is worth exploring for us.

B/c I'll be asked, I did this on an idle, dual 3.06GHz Xeon with 6GB of
memory, U320 SCSI internal drives and PostgreSQL 8.1.4.

Greg

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Trying to create multi db query in one large querie

2004-12-13 Thread Spiegelberg, Greg
Hello,

My experience with dblink() is that each dblink() is executed serially, in
part I would guess, due to the plan for the query.  To have each query run
in parallel you would need to execute both dblink()'s simultaneously saving
each result into a table.  I'm not sure if the same table could be
specified.  Would depend on the constaint's I suppose.

#!/bin/sh
# Query 1
psql -d mydb -c "select * into mytable from dblink('db1','select * from
customer_data where timestamp between timestamp \'01-01-2004\' and timestamp
\'06-30-2004\'') as t1(c1 int, c2 text, ...);" & PID1=$!
# Query 2
psql -d mydb -c "select * into mytable from dblink('db2','select * from
customer_data where timestamp between timestamp \'01-07-2004\' and timestamp
\'12-31-2004\'') as t2(c1 int, c2 text, ...);" & PID2=$!
# wait
wait $PID1
wait $PID2
# Do more on mydb.mytable
...

Something like that so no guaranties.  I do remember testing with this a
while back and it is useful for JOIN's.

Greg


-Original Message-
From: Hasnul Fadhly bin Hasan
To: [EMAIL PROTECTED]
Sent: 12/13/04 8:44 PM
Subject: [PERFORM] Trying to create multi db query in one large queries

Hi,

I am not sure if this is the place to ask this question, but since the 
question is trying to improve the performance.. i guess i am not that 
far off.

My question is if there is a query design that would query multiple 
server simultaneously.. would that improve the performance?

To make it clear.. let's say we have 3 db servers.  1 server is just 
designed to take the queries while the other 2 server is the ones that 
actually
holds the data.  let's say we have a query of 'select * from 
customer_data' and we change it to
select * from
(
dblink('db1','select * from customer_data where timestamp between 
timestamp \'01-01-2004\' and timestamp \'06-30-2004\'')
union
dblink('db2','select * from customer_data where timestamp between 
timestamp \'01-07-2004\' and timestamp \'12-31-2004\'')
)

Would the subquery above be done simultaneously by postgres before doing

the end query? or would it just execute one at a time?

If it does execute simultaneously.. it's possible to create code to 
convert normal queries to distributed queries and requesting data from 
multiple
database to improve performance.  This would be advantageous for large 
amount of data.

Thanks,

Hasnul



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Spiegelberg, Greg
Isn't this a prime example of when to use a servlet or something similar
in function?  It will create the cursor, maintain it, and fetch against
it for a particular page.

Greg


-Original Message-
From: Richard Huxton [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 20, 2005 10:21 AM
To: Andrei Bintintan
Cc: pgsql-sql@postgresql.org; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] [SQL] OFFSET impact on Performance???


Andrei Bintintan wrote:
>> If you're using this to provide "pages" of results, could you use a 
>> cursor?
> 
> What do you mean by that? Cursor?
> 
> Yes I'm using this to provide "pages", but If I jump to the last pages 
> it goes very slow.

DECLARE mycursor CURSOR FOR SELECT * FROM ...
FETCH FORWARD 10 IN mycursor;
CLOSE mycursor;

Repeated FETCHes would let you step through your results. That won't 
work if you have a web-app making repeated connections.

If you've got a web-application then you'll probably want to insert the 
results into a cache table for later use.

--
   Richard Huxton
   Archonet Ltd

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] failures on machines using jfs

2004-01-11 Thread Spiegelberg, Greg
It would seem we're experiencing somthing similiar with our scratch
volume (JFS mounted with noatime).  It is still much faster than our
experiments with ext2, ext3, and reiserfs but occasionally during
large loads it will hiccup for a couple seconds but no crashes yet.

I'm reluctant to switch back to any other file system because the
data import took a little over 1.5 hours but now takes just under
20 minutes and we haven't crashed yet.

For future reference:

 RedHat 7.3 w/2.4.18-18.7smp
 PostgreSQL 7.3.3 from source
 jfsutils 1.0.17-1
 Dual PIII Intel 1.4GHz & 2GB ECC
 Internal disk: 2xU160 SCSI, mirrored, location of our JFS file system
 External disk  Qlogic 2310 attached to FC-SW @2Gbps with ext3 on those LUNs

Greg


-Original Message-
From: Christopher Browne
To: [EMAIL PROTECTED]
Sent: 1/10/04 9:08 PM
Subject: Re: [PERFORM] failures on machines using jfs

[EMAIL PROTECTED] (Robert Creager) writes:
> When grilled further on (Wed, 7 Jan 2004 18:06:08 -0500),
> Andrew Sullivan <[EMAIL PROTECTED]> confessed:
>
>> We have lately had a couple of cases where machines either locked
>> up, slowed down to the point of complete unusability, or died
>> completely while using jfs.  We are _not_ sure that jfs is in fact
>> the culprit.  In one case, a kernel panic appeared to be referring
>> to the jfs kernel module, but I can't be sure as I lost the output
>> immediately thereafter.  Yesterday, we had a problem of data
>> corruption on a failed jfs volume.
>> 
>> None of this is to say that jfs is in fact to blame, nor even that,
>> if it is, it does not have something to do with the age of our
>> installations, &c. (these are all RH 8).  In fact, I suspect
>> hardware in both cases.  But I thought I'd mention it just in case
>> other people are seeing strange behaviour, on the principle of
>> "better safe than sorry."
>
> Interestingly enough, I'm using JFS on a new scsi disk with Mandrake
> 9.1 and was having similar problems.  I was generating heavy disk
> usage through database and astronomical data reductions.  My machine
> (dual AMD) would suddenly hang.  No new jobs would run, just
> increase the load, until I reboot the machine.
>
> I solved my problems by creating a 128Mb ram disk (using EXT2) for
> the temp data produced my reduction runs.
>
> I believe JFS was to blame, not hardware, but you never know...

Interesting.

The set of concurrent factors that came together to appear when this
happened "consistently" were thus:

 1.  Heavy DB updates taking place on JFS filesystems;

 2.  SMP (we suspected Xeon hyperthreading as a possible factor, but
 shut it off and still saw the same problem...)

 3.  The third factor that appeared a catalyst was copying, via scp, a
 file > 2GB in size onto the system.

The third piece was a particularly interesting aspect; the file would
get copied over successfully, and the scp process would hang (to the
point of "kill -9" being unable to touch it) immediately thereafter.

At that point, processes on the system that were accessing files on
the hung-up filesystem were locked, also unkillable by "kill 9."
That's certainly consistent with JFS being at the root of the problem,
whether it was the cause or not...
-- 
let name="cbbrowne" and tld="libertyrms.info" in String.concat "@"
[name;tld];;

Christopher Browne
(416) 646 3304 x124 (land)

---(end of broadcast)---
TIP 8: explain analyze is your friend


**
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.

www.mimesweeper.com
**


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])