On Wed, Oct 21, 2020, 8:42 AM Mats Olsen <m...@duneanalytics.com> wrote:

>
> On 10/21/20 2:38 PM, Sebastian Dressler wrote:
>
> Hi Mats,
>
> On 20. Oct 2020, at 11:37, Mats Julian Olsen <m...@duneanalytics.com>
> wrote:
>
> [...]
>
> 1) Vanilla plan (16 min) : https://explain.depesz.com/s/NvDR
> 2) enable_nestloop=off (4 min): https://explain.depesz.com/s/buKK
> 3) enable_nestloop=off; enable_seqscan=off (2 min):
> https://explain.depesz.com/s/0WXx
>
> How can I get Postgres not to loop over 12M rows?
>
>
> I looked at the plans and your config and there are some thoughts I'm
> having:
>
> - The row estimate is off, as you possibly noticed. This can be possibly
> solved by raising `default_statistics_target` to e.g. 2500 (we typically
> use that) and run ANALYZE
>
> I've `set default_statistics_target=2500` and ran analyze on both tables
> involved, unfortunately the plan is the same. The columns we use for
> joining here are hashes and we expect very few duplicates in the tables.
> Hence I think extended statistics (storing most common values and histogram
> bounds) aren't useful for this kind of data. Would you say the same thing?
>

Have you checked if ndistinct is roughly accurate? It can be set manually
on a column, or set to some value less than one with the calculation
depending on reltuples.

Reply via email to