Just a correction from my previous message regarding the throughput we get.


On that one table with 1.2B row, the plan through the index scan delivers 
actually 50K rows/s in read speed to the application, almost immediately. It 
would go through the entire table in under 7h vs the other approach which still 
didn't deliver any data after 10h.


We do additional joins and logic and out final throughput is about 12K/s (what 
i quoted previously), but this is a case where clearly the index_scan plan 
delivers vastly better performance than the table_seq_scan+sort plan.


Any insight here?


Thank you,

Laurent.

________________________________
From: l...@laurent-hasson.com <l...@laurent-hasson.com>
Sent: Friday, January 25, 2019 2:06:54 PM
To: Tom Lane
Cc: pgsql-performa...@postgresql.org
Subject: Re: Zero throughput on a query on a very large table.


OK... I think we may have cracked this.


First, do you think that 128MB work_mem is ok? We have a 64GB machine and 
expecting fewer than 100 connections. This is really an ETL workload 
environment at this time.


Second, here is what i found and what messed us up.

    select current_setting('random_page_cost'); --> 4

    alter database "CMS_TMP" set random_page_cost=0.00000001;
    select current_setting('random_page_cost'); --> 4 ????

I also tried:
    select current_setting('random_page_cost'); --> 4
    select set_config('random_page_cost', '0.000001', true);
    select current_setting('random_page_cost'); --> 4 ????


Is there something that is happening that is causing those settings to not 
stick? I then tried:


    select current_setting('random_page_cost'); --> 4
    select set_config('random_page_cost', '0.000001', false); -- false now, 
i.e., global
    select current_setting('random_page_cost'); --> 0.000001 !!!!


So i think we just spent 4 days on that issue. I then did

    select set_config('enable_seqscan', 'off', false);
And the plan is now using an index scan, and we are getting 12K rows/s in 
throughput immediately!!! 😊

So i guess my final question is that i really want to only affect that one 
query executing, and i seem to not be able to change the settings used by the 
planner just for that one transaction. I have to change it globally which i 
would prefer not to do. Any help here?


Thanks,

Laurent.

________________________________
From: l...@laurent-hasson.com <l...@laurent-hasson.com>
Sent: Friday, January 25, 2019 1:36:21 PM
To: Tom Lane
Cc: pgsql-performa...@postgresql.org
Subject: Re: Zero throughput on a query on a very large table.


Sorry :) When i look at the "SQL" tab in PGAdmin when i select the index in the 
schema browser. But you are right that /d doesn't show that.

________________________________
From: Tom Lane <t...@sss.pgh.pa.us>
Sent: Friday, January 25, 2019 1:34:01 PM
To: l...@laurent-hasson.com
Cc: pgsql-performa...@postgresql.org
Subject: Re: Zero throughput on a query on a very large table.

"l...@laurent-hasson.com" <l...@laurent-hasson.com> writes:
> Also, the original statement i implemented did not have all of that. This is 
> the normalized SQL that Postgres now gives when looking at the indices.

[ squint... ]  What do you mean exactly by "Postgres gives that"?
I don't see any redundant COLLATE clauses in e.g. psql \d.

                        regards, tom lane

Reply via email to