On 8/7/23 18:56, Nathan Bossart wrote: > On Mon, Aug 07, 2023 at 12:51:24PM +0200, Tomas Vondra wrote: >> The bad news is this seems to have negative impact on cases with few >> partitions, that'd fit into 16 slots. Which is not surprising, as the >> code has to walk longer arrays, it probably affects caching etc. So this >> would hurt the systems that don't use that many relations - not much, >> but still. >> >> The regression appears to be consistently ~3%, and v2 aimed to improve >> that - at least for the case with just 100 rows. It even gains ~5% in a >> couple cases. It's however a bit strange v2 doesn't really help the two >> larger cases. >> >> Overall, I think this seems interesting - it's hard to not like doubling >> the throughput in some cases. Yes, it's 100 rows only, and the real >> improvements are bound to be smaller, it would help short OLTP queries >> that only process a couple rows. > > Indeed. I wonder whether we could mitigate the regressions by using SIMD > intrinsics in the loops. Or auto-vectorization, if that is possible. >
Maybe, but from what I know about SIMD it would require a lot of changes to the design, so that the loops don't mix accesses to different PGPROC fields (fpLockBits, fpRelId) and so on. But I think it'd be better to just stop walking the whole array regularly. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company