On Tue, 2 Jun 2020 at 21:05, Andy Fan <zhihui.fan1...@gmail.com> wrote: > Today I tested the correctness & performance of this patch based on TPC-H > workload, the environment is setup based on [1]. Correctness is tested by > storing the result into another table when this feature is not introduced and > then enable this feature and comparing the result with the original ones. No > issue is found at this stage.
Thank you for testing it out. > I also checked the performance gain for TPC-H workload, totally 4 out of the > 22 > queries uses this new path, 3 of them are subplan, 1 of them is nestloop. All > of > changes gets a better result. You can check the attachments for reference. > normal.log is the data without this feature, patched.log is the data with the > feature. The data doesn't show the 10x performance gain, I think that's mainly > data size related. Thanks for running those tests. I had a quick look at the results and I think to say that all 4 are better is not quite right. One is actually a tiny bit slower and one is only faster due to a plan change. Here's my full analysis. Q2 uses a result cache for the subplan and has about a 37.5% hit ratio which reduces the execution time of the query down to 67% of the original. Q17 uses a result cache for the subplan and has about a 96.5% hit ratio which reduces the execution time of the query down to 24% of the original time. Q18 uses a result cache for 2 x nested loop joins and has a 0% hit ratio. The execution time is reduced to 91% of the original time only because the planner uses a different plan, which just happens to be faster by chance. Q20 uses a result cache for the subplan and has a 0% hit ratio. The execution time is 100.27% of the original time. There are 8620 cache misses. All other queries use the same plan with and without the patch. > At the code level, I mainly checked nestloop path and > cost_resultcache_rescan, > everything looks good to me. I'd like to check the other parts in the > following days. Great. > [1] https://ankane.org/tpc-h