Back in [1] I experimented with a patch to coax compilers to build all elog/ereport calls that were >= ERROR into a cold path away from the function rasing the error. At the time, I really just wanted to test how much of a speedup we could get by doing this and ended up just writing up a patch that basically changed all elog(ERROR) calls from:
if (<error situation check>) { <do stuff>; elog(ERROR, "..."); } to add an unlikely() and become; if (unlikely(<error situation check>) { <do stuff>; elog(ERROR, "..."); } Per the report in [1] I did see some pretty good gains in performance from doing this. The problem was, that at the time I couldn't figure out a way to do this without an invasive patch that changed the code in the thousands of elog/ereport calls. In the attached, I've come up with a way that works. Basically I've just added a function named errstart_cold() that is attributed with __attribute__((cold)), which will hint to the compiler to keep branches which call this function in a cold path. To make the compiler properly mark just >= ERROR calls as cold, and only when the elevel is a constant, I modified the ereport_domain macro to do: if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \ errstart_cold(elevel, domain) : \ errstart(elevel, domain)) \ I see no reason why the compiler shouldn't always fold that constant expression at compile-time and correctly select the correct version of the function for the job. (I also tried using __builtin_choose_expr() but GCC complained when the elevel was not constant, even with the __builtin_constant_p() test in the condition) I sampled a few .s files to inspect what code had changed. Looking at mcxt.s, fmgr.s and xlog.s, the first two of these because I found in [1] that elogs were being done from those files quite often and xlog.s because it's pretty big. As far as I could see, GCC correctly moved all the error reporting stuff where the elevel was a constant and >= ERROR into the cold path and left the lower-level or non-consts elevel calls alone. For clang, I didn't see any changes in the .s files. I suspect that they might have a few smarts in there and see the __builtin_unreachable() call and assume the path is cold already based on that. That was with clang 10. Perhaps older versions are not as smart. Benchmarking: For benchmarking, I've not done a huge amount to test the impacts of this change. However, I can say that I am seeing some fairly good improvements. There seems to be some small improvements to execution speed using TPCH-Q1 and also some good results from a pgbench -S test. For TPCH-Q1: Master: $ pgbench -n -f pg-tpch/queries/q01.sql -T 120 tpch latency average = 5272.630 ms latency average = 5258.610 ms latency average = 5250.871 ms Master + elog_ereport_attribute_cold.patch $ pgbench -n -f pg-tpch/queries/q01.sql -T 120 tpch latency average = 5182.761 ms latency average = 5194.851 ms latency average = 5183.128 ms Which is about a 1.42% increase in performance. That's not exactly groundbreaking, but pretty useful to have if that happens to apply across the board for execution performance. For pgbench -S: My results were a bit noisier than the TPCH test, but the results I obtained did show about a 10% increase in performance: Master: drowley@amd3990x:~$ pgbench -S -T 120 postgres tps = 25245.903255 (excluding connections establishing) tps = 26144.454208 (excluding connections establishing) tps = 25931.850518 (excluding connections establishing) Master + elog_ereport_attribute_cold.patch drowley@amd3990x:~$ pgbench -S -T 120 postgres tps = 28351.480631 (excluding connections establishing) tps = 27763.656557 (excluding connections establishing) tps = 28896.427929 (excluding connections establishing) It would be useful if someone with some server-grade Intel hardware could run a few tests on this. The above results are all from AMD hardware. I've attached the patch for this. I'll add it to the July 'fest. David [1] https://www.postgresql.org/message-id/CAKJS1f8yqRW3qx2CO9r4bqqvA2Vx68=3awbh8cjwtp9zxeo...@mail.gmail.com
elog_ereport_attribute_cold.patch
Description: Binary data