On 2022-07-25 12:04:19 -0700, Nathan Bossart wrote: > On Sat, Jul 16, 2022 at 08:59:57PM -0700, Nathan Bossart wrote: > > On Fri, Jul 15, 2022 at 01:08:57PM -0700, Andres Freund wrote: > >> I wonder if we additionally / alternatively could use a faster method of > >> searching the array linearly, e.g. using SIMD. > > > > I looked into using SIMD. The patch is attached, but it is only intended > > for benchmarking purposes and isn't anywhere close to being worth serious > > review. There may be a simpler/better way to implement the linear search, > > but this seemed to work well. Overall, SIMD provided a decent improvement. > > I had to increase the number of writers quite a bit in order to demonstrate > > where the hash tables began winning. Here are the numbers: > > > > writers head simd hash > > 256 663 632 694 > > 512 530 618 637 > > 768 489 544 573 > > 1024 364 508 562 > > 2048 185 306 485 > > 4096 146 197 441 > > > > While it is unsurprising that the hash tables perform the best, there are a > > couple of advantages to SIMD that might make that approach worth > > considering. For one, there's really no overhead (i.e., you don't need to > > sort the array or build a hash table), so we can avoid picking an arbitrary > > threshold and just have one code path. Also, a SIMD implementation for a > > linear search through an array of integers could likely be easily reused > > elsewhere. > > From the discussion thus far, it seems there is interest in optimizing > [sub]xip lookups, so I'd like to spend some time moving it forward. I > think the biggest open question is which approach to take. Both the SIMD > and hash table approaches seem viable, but I think I prefer the SIMD > approach at the moment (see the last paragraph of quoted text for the > reasons). What do folks think?
Agreed on all points.