Re: [PERFORM] How to "unique-ify" HUGE table?

2008-12-23 Thread Stefan Kaltenbrunner
Scott Marlowe wrote: On Tue, Dec 23, 2008 at 11:14 AM, George Pavlov wrote: You don't say what PG version you are on, but just for kicks you may try using GROUP BY instead of DISTINCT. Yes, the two should perform the same, but with 8.1 (or maybe 8.0) I had seen situations where GROUP BY was fas

Re: [PERFORM] How to "unique-ify" HUGE table?

2008-12-23 Thread Scott Marlowe
On Tue, Dec 23, 2008 at 11:14 AM, George Pavlov wrote: > You don't say what PG version you are on, but just for kicks you may try > using GROUP BY instead of DISTINCT. Yes, the two should perform the > same, but with 8.1 (or maybe 8.0) I had seen situations where GROUP BY > was faster (admittedly

Re: [PERFORM] How to "unique-ify" HUGE table?

2008-12-23 Thread George Pavlov
You don't say what PG version you are on, but just for kicks you may try using GROUP BY instead of DISTINCT. Yes, the two should perform the same, but with 8.1 (or maybe 8.0) I had seen situations where GROUP BY was faster (admittedly this happened with more complex queries). So, try this: CREAT

Re: [PERFORM] How to "unique-ify" HUGE table?

2008-12-23 Thread D'Arcy J.M. Cain
On Tue, 23 Dec 2008 12:25:48 -0500 "Kynn Jones" wrote: > Hi everyone! > I have a very large 2-column table (about 500M records) from which I want to > remove duplicate records. > > I have tried many approaches, but they all take forever. > > The table's definition consists of two short TEXT colu

Re: [PERFORM] How to "unique-ify" HUGE table?

2008-12-23 Thread Scott Marlowe
On Tue, Dec 23, 2008 at 10:25 AM, Kynn Jones wrote: > Hi everyone! > I have a very large 2-column table (about 500M records) from which I want to > remove duplicate records. > I have tried many approaches, but they all take forever. > The table's definition consists of two short TEXT columns. It

[PERFORM] How to "unique-ify" HUGE table?

2008-12-23 Thread Kynn Jones
Hi everyone! I have a very large 2-column table (about 500M records) from which I want to remove duplicate records. I have tried many approaches, but they all take forever. The table's definition consists of two short TEXT columns. It is a temporary table generated from a query: CREATE TEMP TAB

Re: [PERFORM] dbt-2 tuning results with postgresql-8.3.5

2008-12-23 Thread Alvaro Herrera
Mark Wong escribió: > Hrm, tracking just the launcher process certainly doesn't help. Are > the spawned processed short lived? I take a snapshot of > /proc//io data every 60 seconds. The worker processes can be short-lived, but if they are, obviously they are not vacuuming the large tables. If