Andrew Gierth <and...@tao11.riddles.org.uk> writes: > To get a reliable measurement of timing changes less than around 3%, > what you have to do is this: pick some irrelevant function and add > something like an asm directive that inserts a variable number of NOPs, > and do a series of test runs with different values.
Good point. If you're looking at a microbenchmark that only exercises a small amount of code, it can be way worse than that. I was reminded of this the other day while fooling with the problem discussed in https://www.postgresql.org/message-id/flat/6970.1545327...@sss.pgh.pa.us in which we were spending huge amounts of time in a tight loop in match_eclasses_to_foreign_key_col. I normally run with --enable-cassert unless I'm trying to collect performance data; so I rebuilt with --disable-cassert, and was bemused to find out that that test case ran circa 20% *slower* in the non-debug build. This is silly on its face, and even more so when you notice that match_eclasses_to_foreign_key_col itself contains no Asserts and so its machine code is unchanged by the switch. (I went to the extent of comparing .s files to verify this.) So that had to have been down to alignment/cacheline issues triggered by moving said function around. I doubt the case would be exactly reproducible on different hardware or toolchain, but another platform would likely show similar issues on some case or other. tl;dr: even a 20% difference might be nothing more than an artifact. regards, tom lane