On Wed, Oct 21, 2020, 8:42 AM Mats Olsen <m...@duneanalytics.com> wrote:
> > On 10/21/20 2:38 PM, Sebastian Dressler wrote: > > Hi Mats, > > On 20. Oct 2020, at 11:37, Mats Julian Olsen <m...@duneanalytics.com> > wrote: > > [...] > > 1) Vanilla plan (16 min) : https://explain.depesz.com/s/NvDR > 2) enable_nestloop=off (4 min): https://explain.depesz.com/s/buKK > 3) enable_nestloop=off; enable_seqscan=off (2 min): > https://explain.depesz.com/s/0WXx > > How can I get Postgres not to loop over 12M rows? > > > I looked at the plans and your config and there are some thoughts I'm > having: > > - The row estimate is off, as you possibly noticed. This can be possibly > solved by raising `default_statistics_target` to e.g. 2500 (we typically > use that) and run ANALYZE > > I've `set default_statistics_target=2500` and ran analyze on both tables > involved, unfortunately the plan is the same. The columns we use for > joining here are hashes and we expect very few duplicates in the tables. > Hence I think extended statistics (storing most common values and histogram > bounds) aren't useful for this kind of data. Would you say the same thing? > Have you checked if ndistinct is roughly accurate? It can be set manually on a column, or set to some value less than one with the calculation depending on reltuples.