Re: Query plan prefers hash join when nested loop is much faster

2020-08-25 Thread iulian dragos
On Tue, Aug 25, 2020 at 12:36 PM David Rowley wrote: > On Tue, 25 Aug 2020 at 22:10, iulian dragos > wrote: > > Thanks for the tip! Indeed, `n_distinct` isn't right. I found it in > pg_stats set at 131736.0, but the actual number is much higher: 210104361. > I tried to set it manually, but the p

Re: Query plan prefers hash join when nested loop is much faster

2020-08-25 Thread David Rowley
On Tue, 25 Aug 2020 at 22:10, iulian dragos wrote: > Thanks for the tip! Indeed, `n_distinct` isn't right. I found it in pg_stats > set at 131736.0, but the actual number is much higher: 210104361. I tried to > set it manually, but the plan is still the same (both the actual number and a > perc

Re: Query plan prefers hash join when nested loop is much faster

2020-08-25 Thread iulian dragos
On Tue, Aug 25, 2020 at 12:27 AM David Rowley wrote: > On Sat, 22 Aug 2020 at 00:35, iulian dragos > wrote: > > I am trying to understand why the query planner insists on using a hash > join, and how to make it choose the better option, which in this case would > be a nested loop. > > > |

Re: Query plan prefers hash join when nested loop is much faster

2020-08-24 Thread David Rowley
On Sat, 22 Aug 2020 at 00:35, iulian dragos wrote: > I am trying to understand why the query planner insists on using a hash join, > and how to make it choose the better option, which in this case would be a > nested loop. > | -> Index Scan using > test_result_module

Re: Query plan prefers hash join when nested loop is much faster

2020-08-24 Thread iulian dragos
On Mon, Aug 24, 2020 at 4:21 PM iulian dragos wrote: > Hi Michael, > > Thanks for the answer. It's an RDS instance using SSD storage and the > default `random_page_cost` set to 4.0. I don't expect a lot of repetitive > queries here, so I think caching may not be extremely useful. I wonder if > th

Re: Query plan prefers hash join when nested loop is much faster

2020-08-24 Thread iulian dragos
Hi Michael, Thanks for the answer. It's an RDS instance using SSD storage and the default `random_page_cost` set to 4.0. I don't expect a lot of repetitive queries here, so I think caching may not be extremely useful. I wonder if the selectivity of the query is wrongly estimated (out of 500 millio

Re: Query plan prefers hash join when nested loop is much faster

2020-08-21 Thread Michael Lewis
Your system is preferring sequential scan to using test_result_module_result_id_idx in this case. What type of storage do you use, what type of cache hits do you expect, and what do you have random_page_cost set to? That comes to mind as a significant factor in choosing index scans based on costs.

Query plan prefers hash join when nested loop is much faster

2020-08-21 Thread iulian dragos
Hi, I am trying to understand why the query planner insists on using a hash join, and how to make it choose the better option, which in this case would be a nested loop. I have two tables: // about 200 million rows CREATE TABLE module_result( *id* bigserial PRIMARY KEY, name_id bigint NOT

Query plan prefers hash join when nested loop is much faster

2020-08-21 Thread iulian dragos
Hi, I am trying to understand why the query planner insists on using a hash join, and how to make it choose the better option, which in this case would be a nested loop. I have two tables: // about 200 million rows CREATE TABLE module_result( *id* bigserial PRIMARY KEY, name_id bigint NOT