I wrote:
> However, I then tried a partitioned equivalent of the 6-column case
> (script also attached), and it looks like
> 6 columns     16551   19097   15637   18201
> which is really noticeably worse, 16% or so.

... and on the third hand, that might just be some weird compiler-
and platform-specific artifact.

Using the exact same compiler (RHEL8's gcc 8.3.1) on a different
x86_64 machine, I measure the same case as about 7% slowdown not
16%.  That's still not great, but it calls the original measurement
into question, for sure.

Using Apple's clang 12.0.0 on an M1 mini, the patch actually clocks
in a couple percent *faster* than HEAD, for both the partitioned and
unpartitioned 6-column test cases.

So I'm not sure what to make of these results, but my level of concern
is less than it was earlier today.  I might've just gotten trapped by
the usual bugaboo of micro-benchmarking, ie putting too much stock in
only one test case.

                        regards, tom lane


Reply via email to