Here is a last follow-up on adjusting the planner costs calculation 

1 - Francisco, you provide me with a clue I was looking for at this point!

I learned enough PLpgsql over the last week to reprogram all my processes (not 
always elegant but it works). By processing individual records using loop, I 
will be able to move forward with my work and save individual results as they 
are processed. 

Furthermore, it is now easier to predict jobs completion times, since the DB 
uses similar plans for each record. I did a test on a stratified sampling of 
records and I expect the jobs will run for about two weeks (contrarily to 
months in previous estimates!-)

2 - PLpgsql helped me to bypass the problem. As PT suggested, I will try to 
setup a test case that demonstrates the problem and post it to the developers' 
list so they might figure out what to do in such situation.

Thanks to all,
Daniel

-----Original Message-----
From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Francisco Olarte
Sent: June-11-15 10:05
To: Daniel Begin
Cc: Bill Moran; pgsql-general@postgresql.org; Tomas Vondra; Melvin Davidson
Subject: Re: [GENERAL] Planner cost adjustments

Hi Daniel:

On Thu, Jun 11, 2015 at 2:38 PM, Daniel Begin <jfd...@hotmail.com> wrote:
.....
> The remaining problem seems related to the statistics of some large tables.
> On one hand, I might increase the statistic target for these tables to 
> 500, or even to 1000 and look at the results (but I have doubts it 
> will help). On the other hand, I could simply force enable_seqscan to 
> OFF for queries dealing with them.
>
> Still not sure about the best solution but the problem is definitely 
> narrower :-)

One somehow unrelated point. IIRC your problems where related to queries doing 
joins with selected sets of indexed IDs on a smaller table which then have to 
be looked up on some very large tables. ( I'm not able to follow too well which 
is which, users on changesets, I'm a bit lost ). Given your runtimes are always 
high ( in the seconds range, so it seems wire speed / latencies are not too 
much of an issue
) and that selectivity estimates on huge tables are always problematic and may 
be thwarting your plans you may be able to get faster results splitting your 
query.

 If I read your plans correctly, that would be selecting your 600 users in one 
query and then preparing the changeset query for a single user_id, which should 
be indexed, and looping it for every user. Given current machines can easily 
send-receive 600 queries in a second it may lead to a simpler solution. This 
mean you're using the DB as a somehow inteligent plain old indexed file, but 
sometimes this is the simpler approach ( heck, some of my code uses algorithms 
from the tape era as they were the faster way I could do it ).

I needed to do this in one of my programs, the optimizer kept selecting bad 
plans so I did the first query, held the results in memory, and then prepared 
and executed in a loop from the app, my query was selecting a couple thousand 
values from submillion table, and joining with a multimillion one, getting a 
couple hundreds matches per original value. Splitting it made the queries on 
the big table always run indexed and fast ( and as a side bonus avoided 
duplicating the parts of the first record in the wire a hundred times, which 
was nice since the short table was wide and I only needed 3 short fields from 
the second one, and that made the first query run at wire speed and the second 
at disk speed ).

Francisco Olarte.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make 
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to