On Mon, 1 Jun 2020 at 12:27, Pavel Stehule <pavel.steh...@gmail.com> wrote: > po 1. 6. 2020 v 8:15 odesÃlatel Amit Khandekar <amitdkhan...@gmail.com> > napsal: >> >> On Sat, 30 May 2020 at 11:11, Pavel Stehule <pavel.steh...@gmail.com> wrote: >> > I think so the effect of these patches strongly depends on CPU and compile >> >> I quickly tried pi() with gcc 10 as well, and saw more or less the >> same benefit. I think, we are bound to see some differences in the >> benefits across architectures, kernels and compilers; but looks like >> some benefit is always there. >> >> > but it is micro optimization, and when I look to profiler, the bottle neck >> > is elsewhere. >> >> Please check the perf numbers in my reply to Michael. I suppose you >> meant CachedPlanIsSimplyValid() when you say the bottle neck is >> elsewhere ? Yeah, this function is always the hottest spot, which I >> recall is being discussed elsewhere. But I always see exec_stmt(), >> exec_assign_value as the next functions. > > > It is hard to read the profile result, because these functions are nested > together. For your example > > 18.22% postgres postgres [.] CachedPlanIsSimplyValid > > Is little bit strange, and probably this is real bottleneck in your simple > example, and maybe some work can be done there, because you assign just > constant.
I had earlier had a quick look on this one. CachedPlanIsSimplyValid() was, I recall, hitting a hotspot when it tries to access plansource->search_path (possibly cacheline miss). But didn't get a chance to further dig on that. For now, i am focusing on these other functions for which the patches were submitted. > > On second hand, your example is pretty unrealistic - and against any > developer's best practices for writing cycles. > > I think so we can look on PostGIS, where is some computing heavy routines in > PLpgSQL, and we can look on real profiles. > > Probably the most people will have benefit from these optimization. I understand it's not a real world example. For generating perf figures, I had to use an example which amplifies the benefits, so that the effect of the patches on the perf figures also becomes visible. Hence, used that example. I had shown the benefits up-thread using a practical function avg_space(). But the perf figures for that example were varying a lot. So below, what I did was : Run the avg_space() ~150 times, and took perf report. This stabilized the results a bit : HEAD : + 16.10% 17.29% 16.82% postgres postgres [.] ExecInterpExpr + 13.80% 13.56% 14.49% postgres plpgsql.so [.] exec_assign_value + 12.64% 12.10% 12.09% postgres plpgsql.so [.] plpgsql_param_eval_var + 12.15% 11.28% 11.05% postgres plpgsql.so [.] exec_stmt + 10.81% 10.24% 10.55% postgres plpgsql.so [.] exec_eval_expr + 9.50% 9.35% 9.37% postgres plpgsql.so [.] exec_cast_value ..... + 1.19% 1.06% 1.21% postgres plpgsql.so [.] exec_stmts 0001+0002 patches applied (i.e. inline exec_stmt) : + 16.90% 17.20% 16.54% postgres postgres [.] ExecInterpExpr + 16.42% 15.37% 15.28% postgres plpgsql.so [.] exec_assign_value + 11.34% 11.92% 11.93% postgres plpgsql.so [.] plpgsql_param_eval_var + 11.18% 11.86% 10.99% postgres plpgsql.so [.] exec_stmts.part.0 + 10.51% 9.52% 10.61% postgres plpgsql.so [.] exec_eval_expr + 9.39% 9.48% 9.30% postgres plpgsql.so [.] exec_cast_value HEAD : exec_stmts + exec_stmt = ~12.7 % Patched (0001+0002): exec_stmts = 11.3 % Just 0003 patch applied (i.e. inline exec_cast_value) : + 17.00% 16.77% 17.09% postgres postgres [.] ExecInterpExpr + 15.21% 15.64% 15.09% postgres plpgsql.so [.] exec_assign_value + 14.48% 14.06% 13.94% postgres plpgsql.so [.] exec_stmt + 13.26% 13.30% 13.14% postgres plpgsql.so [.] plpgsql_param_eval_var + 11.48% 11.64% 12.66% postgres plpgsql.so [.] exec_eval_expr .... + 1.03% 0.85% 0.87% postgres plpgsql.so [.] exec_stmts HEAD : exec_assign_value + exec_cast_value = ~23.4 % Patched (0001+0002): exec_assign_value = 15.3% Time in milliseconds after calling avg_space() 150 times : HEAD : 7210 Patch 0001+0002 : 6925 Patch 0003 : 6670 Patch 0001+0002+0003 : 6346