Looping in the main group ID. Regards Sushant
On Tue, Oct 20, 2020 at 6:49 PM Sushant Pawar <sushan...@gmail.com> wrote: > > On Tue, Oct 20, 2020 at 3:08 PM Mats Julian Olsen <m...@duneanalytics.com> > wrote: > >> Dear Postgres community, >> >> I'm looking for some help to manage queries against two large tables. >> >> Context: >> We run a relatively large postgresql instance (5TB, 32 vCPU, 120GB RAM) >> with a hybrid transactional/analytical workload. Data is written in batches >> every 15 seconds or so, and the all queryable tables are append-only (we >> never update or delete). Our users can run analytical queries on top of >> these tables. >> >> We recently came across a series of troublesome queries one of which I'll >> dive into here. >> >> Please see the following gist for both the query we run and the \d+ >> output: https://gist.github.com/mewwts/9f11ae5e6a5951593b8999559f5418cf. >> >> The tables in question are: >> - `ethereum.transactions`: 833M rows, partitioned, 171M rows after WHERE >> - `uniswap_v2."Pair_evt_Swap": 12M rows, not partitioned, 12M rows after >> WHERE >> >> The crux of our issue is that the query planner chooses a nested loop >> join for this query. Essentially making this query (and other queries) take >> a very long time to complete. In contrast, by toggling `enable_nestloop` >> and `enable_seqscan` off we can take the total runtime down from 16 minutes >> to 2 minutes. >> >> 1) Vanilla plan (16 min) : https://explain.depesz.com/s/NvDR >> 2) enable_nestloop=off (4 min): https://explain.depesz.com/s/buKK >> 3) enable_nestloop=off; enable_seqscan=off (2 min): >> https://explain.depesz.com/s/0WXx >> > > The cost of a query while using the default Vanila plan is very less > compared to the 3rd plan with nested loop and seqscan being set to off. > As the JIT is enabled, it seems the planner tries to select the plan with > the least cost and going for the plan which is taking more time of > execution. Can you try running this query with JIT=off in the session and > see if it selects the plan with the least time for execution? > >> >> How can I get Postgres not to loop over 12M rows? >> >> Let me know if there is anything I left out here that would be useful for >> further debugging. >> >> -- >> Regards >> > Sushant >