Hello, Andres. Thanks for the feedback again.
> The problem is that we don't want to add a lot of work > KnownAssignedXidsAdd/Remove, because very often nobody will build a snapshot > for that moment and building a sorted, gap-free, linear array of xids isn't > cheap. In my experience it's more common to be bottlenecked by replay CPU > performance than on replica snapshot building. Yes, but my patch adds almost the smallest possible amount for KnownAssignedXidsAdd/Remove - a single write to the array by index. It differs from the first version in this thread which is based on linked lists. The "next valid offset" is just "optimistic optimization" - it means "you could safely skip KnownAssignedXidsNext[i] while finding the next valid". But KnownAssignedXidsNext is not updated by Add/Remove. > During recovery, where there's only one writer to the procarray / known xids, > it might not be hard to opportunistically build a dense version of the known > xids whenever a snapshot is built. AFAIU the patch does exactly the same. On the first snapshot building, offsets to the next valid entry are set. So, a dense version is created on-demand. And this version is reused (even partly if something was removed) on the next snapshot building. > I'm not entirely sure what datastructure would work best. I can see something > like a radix tree work well, or a skiplist style approach. Or a hashtable: We could try to use some other structure (for example - linked hash map) - but the additional cost (memory management, links, hash calculation) will probably significantly reduce performance. And it is a much harder step to perform. So, I think "next valid offset" optimization is a good trade-off for now: * KnownAssignedXidsAdd/Remove are almost not affected in their complexity * KnownAssignedXidsGetAndSetXmin is eliminated from the CPU top on typical read scenario - so, more TPS, less ProcArrayLock contention * it complements GetSnapshotData() scalability - now on standby * changes are small Thanks, Michail.