Re: Query Performance / Planner estimate off

Mats Olsen Tue, 27 Oct 2020 23:46:24 -0700


On 10/21/20 5:35 PM, Sebastian Dressler wrote:

Hi Mats,
Happy to help.
On 21. Oct 2020, at 16:42, Mats Olsen <[email protected]<mailto:[email protected]>> wrote:
On 10/21/20 2:38 PM, Sebastian Dressler wrote:
Hi Mats,
On 20. Oct 2020, at 11:37, Mats Julian Olsen<[email protected] <mailto:[email protected]>> wrote:
[...]
1) Vanilla plan (16 min) : https://explain.depesz.com/s/NvDR<https://explain.depesz.com/s/NvDR>2) enable_nestloop=off (4 min): https://explain.depesz.com/s/buKK<https://explain.depesz.com/s/buKK>3) enable_nestloop=off; enable_seqscan=off (2 min):https://explain.depesz.com/s/0WXx <https://explain.depesz.com/s/0WXx>
How can I get Postgres not to loop over 12M rows?
I looked at the plans and your config and there are some thoughtsI'm having:
- The row estimate is off, as you possibly noticed. This can bepossibly solved by raising `default_statistics_target` to e.g. 2500(we typically use that) and run ANALYZE
I've `set default_statistics_target=2500` and ran analyze on bothtables involved, unfortunately the plan is the same. The columns weuse for joining here are hashes and we expect very few duplicates inthe tables. Hence I think extended statistics (storing most commonvalues and histogram bounds) aren't useful for this kind of data.Would you say the same thing?
Yes, that looks like a given in this case.
- I however think that the misestimate might be caused by theevt_tx_hash being of type bytea. I believe that PG cannot estimatethis very well for JOINs and will rather pick row numbers too low.Hence the nested loop is picked and there might be no way aroundthis. I have experienced similar things when applying JOINs onVARCHAR with e.g. more than 3 fields for comparison.
This is very interesting, and I have never heard of issues with using`bytea` for joins. Our entire database is filled with them, as wedeal with hashes of different lengths. In fact I would estimate that60% of columns are bytea's. My intuition would say that it's betterto store the hashes as byte arrays, rather than `text` fields as youcan compare the raw bytes directly without encoding first? Do youhave any references for this?
Unfortunately, I have not dealt yet with `bytea` that much. It justrang a bell when I saw these kind of off-estimates in combination withnested loops. In the case I referenced it was, that the tables had 3VARCHAR columns to be joined on and the estimate was very much off. Asa result, PG chose nested loops in the upper layers of processing. Dueto another JOIN the estimate went down to 1 row whereas it was 1million rows in reality. Now, yours is "only" a factor 5 away, i.e.this might be a totally different reason.
However, I looked into the plan once more and realized, that thesource of the problem could also be the scan on "Pair_evt_Mint" alongthe date dimension. Although you have a stats target of 10k there. Ifthe timestamp is (roughly) sorted, you could try adding a BRIN indexand by that maybe get a better estimate & scan-time.

Hi again, after around 48 hours a CREATE INDEX CONCURRENTLY ransuccessfully. The new plan still uses a nested loop, but the scan on"Pair_evt_Mint" is now a Parallel index scan. Seehttps://explain.depesz.com/s/8ZzT

Alternatively, since I know the length of the hashes in advance, Icould've used `varchar(n)`, but I don't think there's any gains to behad in postgres by doing that? Something like `bytea(n)` would alsohave been interesting, had postgres been able to exploit thatinformation.
I think giving VARCHAR a shot makes sense, maybe on an experimentalbasis to see whether the estimates get better. Maybe PG can thenestimate that there are (almost) no dupes within the table but thatthere are N-many across tables. Another option to explore is maybe touse UUID as a type. As said above, it more looks like the timestampcausing the mis-estimate.
Maybe try querying this table by itself with that timestamp to seewhat kind of estimate you get?
- Other things to look into:
- work_mem seems too low to me with 56MB, consider raising thisto the GB range to avoid disk-based operations
    - min_parallel_table_scan_size - try 0
    - parallel_setup_cost (default 1000, maybe try 500)
    - parallel_tuple_cost (default 1.0, maybe try 0.1)
- random_page_cost (as mentioned consider raising this maybemuch higher, factor 10 or sth like this) or (typically)seq_page_cost can be possibly much lower (0.1, 0.01) depending onyour storage
I've tried various settings of these parameters now, andunfortunately the only parameter that alters the query plan is thelast one (random_page_cost), which also has the side effect of(almost) forcing sequential scans for most queries as far as Iunderstand? Our storage is Google Cloud pd-ssd.
I think a combination of random_page_cost with parallel_tuple_cost andmin_parallel_table_scan_size might make sense. By that you possiblyget at least parallel sequential scans. But I understand that this ispossibly having the same effect as using `enable_nestloop = off`.


I'll have a closer look at these parameters.

Again, thank you.

Mats

Re: Query Performance / Planner estimate off

Reply via email to