On 21/04/2023 23:33, Nathan Bossart wrote:
On Fri, Apr 21, 2023 at 01:50:34PM +0700, John Naylor wrote:
On Wed, Mar 8, 2023 at 7:25 AM Nathan Bossart <nathandboss...@gmail.com>
wrote:
was mostly a fun weekend project, and I don't presently have any concrete
examples of workloads where this might help.
It seems like that should be demonstrated before seriously considering
this, like a profile where the relevant list functions show up.
Agreed.
Grepping for "tlist_member" and "list_delete_ptr", I don't see any
callers in hot codepaths where this could make a noticeable difference.
So I've marked this as Returned with Feedback in the commitfest.
I noticed that several of the List functions do simple linear searches that
can be optimized with SIMD intrinsics (as was done for XidInMVCCSnapshot in
37a6e5d). The following table shows the time spent iterating over a list
of n elements (via list_member_int) one billion times on my x86 laptop.
n | head (ms) | patched (ms)
------+-----------+--------------
2 | 3884 | 3001
4 | 5506 | 4092
8 | 6209 | 3026
16 | 8797 | 4458
32 | 25051 | 7032
64 | 37611 | 12763
128 | 61886 | 22770
256 | 111170 | 59885
512 | 209612 | 103378
1024 | 407462 | 189484
I'm surprised to see an improvement with n=2 and n=2. AFAICS, the
vectorization only kicks in when n >= 8.
--
Heikki Linnakangas
Neon (https://neon.tech)