Re: [PERFORM] Slow query with a lot of data

2008-08-22 Thread Merlin Moncure
On Fri, Aug 22, 2008 at 2:31 AM, Moritz Onken <[EMAIL PROTECTED]> wrote: > Am 21.08.2008 um 19:08 schrieb Merlin Moncure: > >> On Thu, Aug 21, 2008 at 11:07 AM, Moritz Onken <[EMAIL PROTECTED]> >> wrote: >>> >>> Am 21.08.2008 um 16:39 schrieb Scott Carey: >>> It looks to me like the work_mem d

Re: [PERFORM] Slow query with a lot of data

2008-08-21 Thread Moritz Onken
Am 21.08.2008 um 19:08 schrieb Merlin Moncure: On Thu, Aug 21, 2008 at 11:07 AM, Moritz Onken <[EMAIL PROTECTED]> wrote: Am 21.08.2008 um 16:39 schrieb Scott Carey: It looks to me like the work_mem did have an effect. Your earlier queries had a sort followed by group aggregate at the to

Re: [PERFORM] Slow query with a lot of data

2008-08-21 Thread Merlin Moncure
On Thu, Aug 21, 2008 at 11:07 AM, Moritz Onken <[EMAIL PROTECTED]> wrote: > > Am 21.08.2008 um 16:39 schrieb Scott Carey: > >> It looks to me like the work_mem did have an effect. >> >> Your earlier queries had a sort followed by group aggregate at the top, >> and now its a hash-aggregate. So the

Re: [PERFORM] Slow query with a lot of data

2008-08-21 Thread Moritz Onken
Am 21.08.2008 um 16:39 schrieb Scott Carey: It looks to me like the work_mem did have an effect. Your earlier queries had a sort followed by group aggregate at the top, and now its a hash-aggregate. So the query plan DID change. That is likely where the first 10x performance gain came fr

Re: [PERFORM] Slow query with a lot of data

2008-08-21 Thread Scott Carey
It looks to me like the work_mem did have an effect. Your earlier queries had a sort followed by group aggregate at the top, and now its a hash-aggregate. So the query plan DID change. That is likely where the first 10x performance gain came from. The top of the plan was: GroupAggregate (cost

Re: [PERFORM] Slow query with a lot of data

2008-08-21 Thread Moritz Onken
Am 21.08.2008 um 09:04 schrieb Moritz Onken: Am 20.08.2008 um 20:28 schrieb Tom Lane: "Scott Carey" <[EMAIL PROTECTED]> writes: The planner actually thinks there will only be 28704 rows returned of width 12. But it chooses to sort 53 million rows before aggregating. Thats either a bug

Re: [PERFORM] Slow query with a lot of data

2008-08-21 Thread Moritz Onken
Am 20.08.2008 um 20:28 schrieb Tom Lane: "Scott Carey" <[EMAIL PROTECTED]> writes: The planner actually thinks there will only be 28704 rows returned of width 12. But it chooses to sort 53 million rows before aggregating. Thats either a bug or there's something else wrong here. That is

Re: [PERFORM] Slow query with a lot of data

2008-08-21 Thread Moritz Onken
Am 20.08.2008 um 20:06 schrieb Scott Carey: Ok, so the problem boils down to the sort at the end. The query up through the merge join on domain is as fast as its going to get. The sort at the end however, should not happen ideally. There are not that many rows returned, and it should h

Re: [PERFORM] Slow query with a lot of data

2008-08-20 Thread Tom Lane
"Scott Carey" <[EMAIL PROTECTED]> writes: > The planner actually thinks there will only be 28704 rows returned of width > 12. But it chooses to sort 53 million rows before aggregating. Thats > either a bug or there's something else wrong here. That is the wrong way > to aggregate those results

Re: [PERFORM] Slow query with a lot of data

2008-08-20 Thread Scott Carey
Ok, so the problem boils down to the sort at the end. The query up through the merge join on domain is as fast as its going to get. The sort at the end however, should not happen ideally. There are not that many rows returned, and it should hash_aggregate if it thinks there is enough space to do

Re: [PERFORM] Slow query with a lot of data

2008-08-20 Thread Scott Carey
More work_mem will make the sort fit more in memory and less on disk, even with the same query plan. On Wed, Aug 20, 2008 at 12:54 AM, Moritz Onken <[EMAIL PROTECTED]>wrote: > > Am 19.08.2008 um 17:23 schrieb Moritz Onken: > > >> Am 19.08.2008 um 16:49 schrieb Scott Carey: >> >> What is your wo

Re: [PERFORM] Slow query with a lot of data

2008-08-20 Thread Zoltan Boszormenyi
Moritz Onken írta: > > Am 19.08.2008 um 17:23 schrieb Moritz Onken: > >> >> Am 19.08.2008 um 16:49 schrieb Scott Carey: >> >>> What is your work_mem set to? The default? >>> >>> Try increasing it significantly if you have the RAM and seeing if >>> that affects the explain plan. You may even want

Re: [PERFORM] Slow query with a lot of data

2008-08-20 Thread Moritz Onken
Am 19.08.2008 um 17:23 schrieb Moritz Onken: Am 19.08.2008 um 16:49 schrieb Scott Carey: What is your work_mem set to? The default? Try increasing it significantly if you have the RAM and seeing if that affects the explain plan. You may even want to set it to a number larger than the

Re: [PERFORM] Slow query with a lot of data

2008-08-19 Thread Moritz Onken
Am 19.08.2008 um 16:49 schrieb Scott Carey: What is your work_mem set to? The default? Try increasing it significantly if you have the RAM and seeing if that affects the explain plan. You may even want to set it to a number larger than the RAM you have just to see what happens. In all

Re: [PERFORM] Slow query with a lot of data

2008-08-19 Thread Scott Carey
What is your work_mem set to? The default? Try increasing it significantly if you have the RAM and seeing if that affects the explain plan. You may even want to set it to a number larger than the RAM you have just to see what happens. In all honesty, it may be faster to overflow to OS swap spac

Re: [PERFORM] Slow query with a lot of data

2008-08-19 Thread Moritz Onken
Am 19.08.2008 um 14:17 schrieb Matthew Wakeling: On Tue, 19 Aug 2008, Moritz Onken wrote: tablename| attname | n_distinct | correlation result | domain | 1642 | 1 Well, the important thing is the correlation, which is 1, indicating that Postgr

Re: [PERFORM] Slow query with a lot of data

2008-08-19 Thread Matthew Wakeling
On Tue, 19 Aug 2008, Moritz Onken wrote: tablename| attname | n_distinct | correlation result | domain | 1642 | 1 Well, the important thing is the correlation, which is 1, indicating that Postgres knows that the table is clustered. So I have no

Re: [PERFORM] Slow query with a lot of data

2008-08-19 Thread Moritz Onken
As far as I can tell, it should. If it is clustered on an index on domain, and then analysed, it should no longer have to sort on domain. Could you post here the results of running: select * from pg_stats where attname = 'domain'; schemaname | tablename| attname | null_fra

Re: [PERFORM] Slow query with a lot of data

2008-08-19 Thread Matthew Wakeling
On Tue, 19 Aug 2008, Moritz Onken wrote: explain select a."user", b.category, sum(1.0/b.cat_count)::float from result a, domain_categories b where a."domain" = b."domain" group by a."user", b.category; Both results and domain_categories are clustered on domain and analyzed. Why is it still sor

Re: [PERFORM] Slow query with a lot of data

2008-08-19 Thread Moritz Onken
Am 18.08.2008 um 18:05 schrieb Matthew Wakeling: On Mon, 18 Aug 2008, Moritz Onken wrote: Running the query for more than one user is indeed not much slower. That's what I need. I'm clustering the results table on domain right now. But why is this better than clustering it on "user"? The

Re: [PERFORM] Slow query with a lot of data

2008-08-18 Thread Matthew Wakeling
On Mon, 18 Aug 2008, Moritz Onken wrote: Running the query for more than one user is indeed not much slower. That's what I need. I'm clustering the results table on domain right now. But why is this better than clustering it on "user"? The reason is the way that the merge join algorithm works.

Re: [PERFORM] Slow query with a lot of data

2008-08-18 Thread Moritz Onken
Well, you're getting the database to read the entire contents of the domain_categories table in order. That's 12 million rows - a fair amount of work. You may find that removing the "user = 1337" constraint doesn't make the query much slower - that's where you get a big win by clusterin

Re: [PERFORM] Slow query with a lot of data

2008-08-18 Thread Matthew Wakeling
On Mon, 18 Aug 2008, Moritz Onken wrote: "HashAggregate (cost=817397.78..817428.92 rows=2491 width=8) (actual time=42874.339..42878.419 rows=3361 loops=1)" " -> Merge Join (cost=748.47..674365.50 rows=19070970 width=8) (actual > time=15702.449..42829.388 rows=36308 loops=1)" "Merge

Re: [PERFORM] Slow query with a lot of data

2008-08-18 Thread Moritz Onken
Am 18.08.2008 um 16:30 schrieb Matthew Wakeling: On Mon, 18 Aug 2008, Moritz Onken wrote: I have indexes on result.domain, domain_categories.domain, result.user, domain_categories.category. Clustered result on user and domain_categories on domain. "-> Materialize (cost=2118752.2

Re: [PERFORM] Slow query with a lot of data

2008-08-18 Thread Matthew Wakeling
On Mon, 18 Aug 2008, Moritz Onken wrote: I have indexes on result.domain, domain_categories.domain, result.user, domain_categories.category. Clustered result on user and domain_categories on domain. "-> Materialize (cost=2118752.28..2270064.64 rows=12104989 width=8) (actual time=464

[PERFORM] Slow query with a lot of data

2008-08-18 Thread Moritz Onken
Hi, I run this query: select max(a."user"), b.category, count(1) from result a, domain_categories b where a."domain" = b."domain" group by b.category; the table result contains all websites a user visited. And the table domain_categories contains all categories a domain is in. result has 2