Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Andy Fan Sun, 21 Feb 2021 02:12:13 -0800

On Tue, Feb 16, 2021 at 6:16 PM David Rowley <dgrowle...@gmail.com> wrote:


> On Wed, 3 Feb 2021 at 19:51, David Rowley <dgrowle...@gmail.com> wrote:
> > I've attached a spreadsheet with the results of each of the tests.
> >
> > The attached file v13_costing_hacks.patch.txt is the quick and dirty
> > patch I put together to run test 5.
>
> I've attached an updated set of patches.  I'd forgotten to run make
> check-world with the 0005 patch and that was making the CF bot
> complain.  I'm not intending 0005 for commit in the state that it's
> in, so I've just dropped it.
>
> I've also done some further performance testing with the attached set
> of patched, this time I focused solely on planner performance using
> the Join Order Benchmark.  Some of the queries in this benchmark do
> give the planner quite a bit of exercise. Queries such as 29b take my
> 1-year old, fairly powerful AMD hardware about 78 ms to make a plan
> for.
>
> The attached spreadsheet shows the details of the results of these
> tests.  Skip to the "Test6 no parallel 100 stats EXPLAIN only" sheet.
>
> To get these results I just ran pgbench for 10 seconds on each query
> prefixed with "EXPLAIN ".
>
> To summarise here, the planner performance gets a fair bit worse with
> the patched code. With master, summing the average planning time over
> each of the queries resulted in a total planning time of 765.7 ms.
> After patching, that went up to 1097.5 ms.  I was pretty disappointed
> about that.
>
> On looking into why the performance gets worse, there's a few factors.
> One factor is that I'm adding a new path to consider and if that path
> sticks around then subsequent joins may consider that path.  Changing
> things around so I only ever add the best path, the time went down to
> 1067.4 ms.  add_path() does tend to ditch inferior paths anyway, so
> this may not really be a good thing to do. Another thing that I picked
> up on was the code that checks if a Result Cache Path is legal to use,
> it must check if the inner side of the join has any volatile
> functions. If I just comment out those checks, then the total planning
> time goes down to 985.6 ms.  The estimate_num_groups() call that the
> costing for the ResultCache path must do to estimate the cache hit
> ratio is another factor. When replacing that call with a constant
> value the total planning time goes down to 905.7 ms.
>
> I can see perhaps ways that the volatile function checks could be
> optimised a bit further, but the other stuff really is needed, so it
> appears if we want this, then it seems like the planner is going to
> become slightly slower.    That does not exactly fill me with joy.  We
> currently have enable_partitionwise_aggregate and
> enable_partitionwise_join which are both disabled by default because
> of the possibility of slowing down the planner.  One option could be
> to make enable_resultcache off by default too.   I'm not really liking
> the idea of that much though since anyone who leaves the setting that
> way won't ever get any gains from caching the inner side of
> parameterised nested loop results.
>
> The idea I had to speed up the volatile function call checks was along
> similar lines to what parallel query does when it looks for parallel
> unsafe functions in the parse. Right now those checks are only done
> under a few conditions where we think that parallel query might
> actually be used. (See standard_planner()). However, with Result
> Cache, those could be used in many other cases too, so we don't really
> have any means to short circuit those checks.  There might be gains to
> be had by checking the parse once rather than having to call
> contains_volatile_functions in the various places we do call it. I
> think both the parallel safety and volatile checks could then be done
> in the same tree traverse. Anyway. I've not done any hacking on this.
> It's just an idea so far.
>
>

> Does anyone have any particular thoughts on the planner slowdown?
>

I used the same JOB test case and testing with 19c.sql, I can get a similar
result with you (There are huge differences between master and v14).  I
think the reason is we are trying the result cache path on a very hot line (
nest loop inner path),   so the cost will be huge.   I see
get_resultcache_path
has some fastpath to not create_resultcache_path, but the limitation looks
too broad. The below is a small adding on it, the planing time can be
reduced from 79ms to 52ms for 19c.sql in my hardware.

+       /*
+        * If the inner path is cheap enough, no bother to try the result
+        * cache path. 20 is just an arbitrary value. This may reduce some
+        * planning time.
+        */
+       if (inner_path->total_cost < 20)
+               return NULL;

> I used imdbpy2sql.py to parse the imdb database files and load the
> data into PostgreSQL.  This seemed to work okay apart from the
> movie_info_idx table appeared to be missing.  Many of the 113 join
> order benchmark queries need this table.

I followed the steps in [1] and changed something with the attached patch.
At last I got 2367725 rows.  But probably you are running into a different
problem since no change is for movie_info_idx table.

[1] https://github.com/gregrahn/join-order-benchmark


-- 
Best Regards
Andy Fan (https://www.aliyun.com/)

0001-fix.patch
Description: Binary data

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Reply via email to