On Tue, Feb 16, 2021 at 6:16 PM David Rowley <dgrowle...@gmail.com> wrote:
> On Wed, 3 Feb 2021 at 19:51, David Rowley <dgrowle...@gmail.com> wrote: > > I've attached a spreadsheet with the results of each of the tests. > > > > The attached file v13_costing_hacks.patch.txt is the quick and dirty > > patch I put together to run test 5. > > I've attached an updated set of patches. I'd forgotten to run make > check-world with the 0005 patch and that was making the CF bot > complain. I'm not intending 0005 for commit in the state that it's > in, so I've just dropped it. > > I've also done some further performance testing with the attached set > of patched, this time I focused solely on planner performance using > the Join Order Benchmark. Some of the queries in this benchmark do > give the planner quite a bit of exercise. Queries such as 29b take my > 1-year old, fairly powerful AMD hardware about 78 ms to make a plan > for. > > The attached spreadsheet shows the details of the results of these > tests. Skip to the "Test6 no parallel 100 stats EXPLAIN only" sheet. > > To get these results I just ran pgbench for 10 seconds on each query > prefixed with "EXPLAIN ". > > To summarise here, the planner performance gets a fair bit worse with > the patched code. With master, summing the average planning time over > each of the queries resulted in a total planning time of 765.7 ms. > After patching, that went up to 1097.5 ms. I was pretty disappointed > about that. > > On looking into why the performance gets worse, there's a few factors. > One factor is that I'm adding a new path to consider and if that path > sticks around then subsequent joins may consider that path. Changing > things around so I only ever add the best path, the time went down to > 1067.4 ms. add_path() does tend to ditch inferior paths anyway, so > this may not really be a good thing to do. Another thing that I picked > up on was the code that checks if a Result Cache Path is legal to use, > it must check if the inner side of the join has any volatile > functions. If I just comment out those checks, then the total planning > time goes down to 985.6 ms. The estimate_num_groups() call that the > costing for the ResultCache path must do to estimate the cache hit > ratio is another factor. When replacing that call with a constant > value the total planning time goes down to 905.7 ms. > > I can see perhaps ways that the volatile function checks could be > optimised a bit further, but the other stuff really is needed, so it > appears if we want this, then it seems like the planner is going to > become slightly slower. That does not exactly fill me with joy. We > currently have enable_partitionwise_aggregate and > enable_partitionwise_join which are both disabled by default because > of the possibility of slowing down the planner. One option could be > to make enable_resultcache off by default too. I'm not really liking > the idea of that much though since anyone who leaves the setting that > way won't ever get any gains from caching the inner side of > parameterised nested loop results. > > The idea I had to speed up the volatile function call checks was along > similar lines to what parallel query does when it looks for parallel > unsafe functions in the parse. Right now those checks are only done > under a few conditions where we think that parallel query might > actually be used. (See standard_planner()). However, with Result > Cache, those could be used in many other cases too, so we don't really > have any means to short circuit those checks. There might be gains to > be had by checking the parse once rather than having to call > contains_volatile_functions in the various places we do call it. I > think both the parallel safety and volatile checks could then be done > in the same tree traverse. Anyway. I've not done any hacking on this. > It's just an idea so far. > > > Does anyone have any particular thoughts on the planner slowdown? > I used the same JOB test case and testing with 19c.sql, I can get a similar result with you (There are huge differences between master and v14). I think the reason is we are trying the result cache path on a very hot line ( nest loop inner path), so the cost will be huge. I see get_resultcache_path has some fastpath to not create_resultcache_path, but the limitation looks too broad. The below is a small adding on it, the planing time can be reduced from 79ms to 52ms for 19c.sql in my hardware. + /* + * If the inner path is cheap enough, no bother to try the result + * cache path. 20 is just an arbitrary value. This may reduce some + * planning time. + */ + if (inner_path->total_cost < 20) + return NULL; > I used imdbpy2sql.py to parse the imdb database files and load the > data into PostgreSQL. This seemed to work okay apart from the > movie_info_idx table appeared to be missing. Many of the 113 join > order benchmark queries need this table. I followed the steps in [1] and changed something with the attached patch. At last I got 2367725 rows. But probably you are running into a different problem since no change is for movie_info_idx table. [1] https://github.com/gregrahn/join-order-benchmark -- Best Regards Andy Fan (https://www.aliyun.com/)
0001-fix.patch
Description: Binary data