Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Luke Lonergan
Tom, On 1/30/07 9:55 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: >> Gregory Stark wrote: >>> (Incidentally I'm not sure where 2-5x comes from. It's entirely dependant on >>> your data distribution. It's not hard to come up with distributions where >>> it

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Gregory Stark wrote: >> (Incidentally I'm not sure where 2-5x comes from. It's entirely dependant on >> your data distribution. It's not hard to come up with distributions where >> it's >> 1000x as fast and others where there's no speed difference.) >

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Luke Lonergan
Argh! ### ### # ## ### ### # ## ## ##

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Alvaro Herrera
Luke Lonergan wrote: > Alvaro, > > On 1/30/07 9:04 AM, "Alvaro Herrera" <[EMAIL PROTECTED]> wrote: > > >> (Incidentally I'm not sure where 2-5x comes from. It's entirely dependant > >> on > >> your data distribution. It's not hard to come up with distributions where > >> it's > >> 1000x as fast

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Josh Berkus
Luke, > You got me - I'll bite - what's PHB? Pointy Haired Boss. It's a Dilbert reference. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate

Re: [PERFORM] Very slow queries

2007-01-30 Thread Chad Wagner
On 1/30/07, Sidar López Cruz <[EMAIL PROTECTED]> wrote: query: Delete From ceroriesgo.salarios Where numero_patrono Not In (Select numero_patrono From ceroriesgo.patronos) Seq Scan on salarios (cost=51021.78..298803854359.95 rows=14240077 width=6) Filter: (NOT (subplan)) SubPlan -> Ma

Re: [PERFORM] Very slow queries

2007-01-30 Thread Ted Allen
What indexes do those tables have? Any? Sidar López Cruz wrote: Check this: query: Delete From ceroriesgo.salarios Where numero_patrono Not In (Select numero_patrono From ceroriesgo.patronos) Seq Scan on salarios (cost=51021.78..298803854359.95 rows=14240077 width=6) Filter: (NOT (subpl

Re: [PERFORM] Very slow queries

2007-01-30 Thread Sidar López Cruz
From: Ted Allen <[EMAIL PROTECTED]> To: Sidar López Cruz <[EMAIL PROTECTED]> CC: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Very slow queries Date: Tue, 30 Jan 2007 16:14:38 -0500 What indexes do those tables have? Any? Yes: TABLE ceroriesgo.patronos ADD CONSTRAINT patrono

[PERFORM] Very slow queries

2007-01-30 Thread Sidar López Cruz
Check this: query: Delete From ceroriesgo.salarios Where numero_patrono Not In (Select numero_patrono From ceroriesgo.patronos) Seq Scan on salarios (cost=51021.78..298803854359.95 rows=14240077 width=6) Filter: (NOT (subplan)) SubPlan -> Materialize (cost=51021.78..69422.58 rows=10329

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Steinar H. Gunderson
On Tue, Jan 30, 2007 at 10:20:28AM -0800, Luke Lonergan wrote: > You got me - I'll bite - what's PHB? Usually the Pointy-Haired Boss, a figure from Dilbert. http://en.wikipedia.org/wiki/Pointy_Haired_Boss /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of bro

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Luke Lonergan
Alvaro, On 1/30/07 9:04 AM, "Alvaro Herrera" <[EMAIL PROTECTED]> wrote: >> (Incidentally I'm not sure where 2-5x comes from. It's entirely dependant on >> your data distribution. It's not hard to come up with distributions where >> it's >> 1000x as fast and others where there's no speed differenc

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Alvaro Herrera
Gregory Stark wrote: > (Incidentally I'm not sure where 2-5x comes from. It's entirely dependant on > your data distribution. It's not hard to come up with distributions where it's > 1000x as fast and others where there's no speed difference.) So the figure is really "1-1000x"? I bet this one is

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Bruno Wolff III
On Tue, Jan 30, 2007 at 14:33:34 +0600, Igor Lobanov <[EMAIL PROTECTED]> wrote: > Greetings! > > I have rather large table with about 5 millions of rows and a dozen of > columns. Let's suppose that columns are named 'a', 'b', 'c' etc. I need > to query distinct pairs of ('a';'b') from this tab

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Gregory Stark
"Luke Lonergan" <[EMAIL PROTECTED]> writes: > Chad, > > On 1/30/07 6:13 AM, "Chad Wagner" <[EMAIL PROTECTED]> wrote: > > > Sounds like an opportunity to implement a "Sort Unique" (sort of like a > > hash, > > I guess), there is no need to push 3M rows through a sort algorithm to only > > shave

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Luke Lonergan
Chad, On 1/30/07 7:03 AM, "Chad Wagner" <[EMAIL PROTECTED]> wrote: > On 1/30/07, Luke Lonergan <[EMAIL PROTECTED]> wrote: >> Not that it helps Igor, but we've implemented single pass sort/unique, >> grouping and limit optimizations and it speeds things up to a single seqscan >> over the data, fro

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Chad Wagner
On 1/30/07, Luke Lonergan <[EMAIL PROTECTED]> wrote: Not that it helps Igor, but we've implemented single pass sort/unique, grouping and limit optimizations and it speeds things up to a single seqscan over the data, from 2-5 times faster than a typical external sort. Was that integrated back

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Luke Lonergan
Chad, On 1/30/07 6:13 AM, "Chad Wagner" <[EMAIL PROTECTED]> wrote: > Sounds like an opportunity to implement a "Sort Unique" (sort of like a hash, > I guess), there is no need to push 3M rows through a sort algorithm to only > shave it down to 1848 unique records. > > I am assuming this optimiza

Re: [PERFORM] Bad Row Count Estimate on View with 8.2

2007-01-30 Thread Dave Dutcher
> From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane > > I took another look and think I found the problem: 8.2's new code for > flattening UNION ALL subqueries into "append relations" is failing to > initialize all the fields of the appendrel, which confuses > estimate_num

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Brian Herlihy
As I understand, there's no hashing for DISTINCT, but there is for GROUP BY. GROUP BY will choose between a hash and a sort (or maybe other options?) depending on the circumstances. So you can write SELECT a, b FROM tbl GROUP BY a,b and the sort/unique part of the query may run faster. Brian

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Chad Wagner
On 1/30/07, Simon Riggs <[EMAIL PROTECTED]> wrote: > explain analyze select distinct a, b from tbl > > EXPLAIN ANALYZE output is: > > Unique (cost=500327.32..525646.88 rows=1848 width=6) (actual > time=52719.868..56126.356 rows=5390 loops=1) > -> Sort (cost=500327.32..508767.17 rows=337

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Richard Huxton
Simon Riggs wrote: On Tue, 2007-01-30 at 15:33 +0600, Igor Lobanov wrote: explain analyze select distinct a, b from tbl EXPLAIN ANALYZE output is: Unique (cost=500327.32..525646.88 rows=1848 width=6) (actual time=52719.868..56126.356 rows=5390 loops=1) -> Sort (cost=500327.32..50876

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Simon Riggs
On Tue, 2007-01-30 at 15:33 +0600, Igor Lobanov wrote: > explain analyze select distinct a, b from tbl > > EXPLAIN ANALYZE output is: > > Unique (cost=500327.32..525646.88 rows=1848 width=6) (actual > time=52719.868..56126.356 rows=5390 loops=1) > -> Sort (cost=500327.32..508767.17 row

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread imad
To be sure about the performance of index scan, try forcing the planner to use it instead of seq scan. A way might be to force the planner to use index scan on your table by using a dummy where clause. Try using a condition in your where clause which holds true for all rows. --Imad www.Enterprise

Re: [PERFORM] int4 vs varchar to store ip addr

2007-01-30 Thread Pomarede Nicolas
On Mon, 29 Jan 2007, Florian Weimer wrote: * Pomarede Nicolas: I could use PG internal inet/cidr type to store the ip addrs, which would take 12 bytes per IP, thus gaining a few bytes per row. I thought it's down to 8 bytes in PostgreSQL 8.2, but I could be mistaken. Apart from gaining som

Re: [PERFORM] Partitioning

2007-01-30 Thread Rigmor Ukuhe
Abu Mushayeed wrote: I have partitioned a table based on period (e.g., cdate >= '2007-01-01'::date and cdate<=.2007-03-31':;date). Now, I am issuing query like cdate >= CURRENT_DATE - 1 and cdate <= CURRENT_DATE, it scans all the partitions. But if I do cdate >= '2007-01-01'::date and cdate<=

Re: [PERFORM] Tuning

2007-01-30 Thread John Parnefjord
> What are Oracle and EnterpriseDB recommending for shmmax these days? According to Oracle "set to a value half the size of physical memory". [http://www.oracle.com/technology/tech/linux/validated-configurations/ht ml/vc_dell6850-rhel4-cx500-1_1.html] I've been talking to an Oracle DBA and he sa

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Richard Huxton
Igor Lobanov wrote: Richard Huxton wrote: I have rather large table with about 5 millions of rows and a dozen of columns. Let's suppose that columns are named 'a', 'b', 'c' etc. I need to query distinct pairs of ('a';'b') from this table. > What version of PostgreSQL is it? 8.1.4 Curre

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Igor Lobanov
Forgot to mention that our work_mem setting is 20480 Kb. You might want to try increasing work_mem for this one query to speed any sorting. -- Igor Lobanov Internal Development Engineer SWsoft, Inc. ---(end of broadcast)--- TIP 5: don't forget

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Igor Lobanov
Richard Huxton wrote: I have rather large table with about 5 millions of rows and a dozen of columns. Let's suppose that columns are named 'a', 'b', 'c' etc. I need to query distinct pairs of ('a';'b') from this table. > What version of PostgreSQL is it? 8.1.4 How many distinct values are

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Richard Huxton
Igor Lobanov wrote: Greetings! I have rather large table with about 5 millions of rows and a dozen of columns. Let's suppose that columns are named 'a', 'b', 'c' etc. I need to query distinct pairs of ('a';'b') from this table. Creating compound index on this table using following statement

[PERFORM] Querying distinct values from a large table

2007-01-30 Thread Igor Lobanov
Greetings! I have rather large table with about 5 millions of rows and a dozen of columns. Let's suppose that columns are named 'a', 'b', 'c' etc. I need to query distinct pairs of ('a';'b') from this table. I use following query: SELECT DISTINCT a, b FROM tbl; but unfortunately, it takes f