On Wed, Apr 16, 2025 at 9:14 PM Jakub Wartak <jakub.war...@enterprisedb.com> wrote: > 2. Should we also interleave DSA/DSM for Parallel Query? (I'm not an > expert on DSA/DSM at all)
I have no answers but I have speculated for years about a very specific case (without any idea where to begin due to lack of ... I guess all this sort of stuff): in ExecParallelHashJoinNewBatch(), workers split up and try to work on different batches on their own to minimise contention, and when that's not possible (more workers than batches, or finishing their existing work at different times and going to help others), they just proceed in round-robin order. A beginner thought is: if you're going to help someone working on a hash table, it would surely be best to have the CPUs and all the data on the same NUMA node. During loading, cache line ping pong would be cheaper, and during probing, it *might* be easier to tune explicit memory prefetch timing that way as it would look more like a single node system with a fixed latency, IDK (I've shared patches for prefetching before that showed pretty decent speedups, and the lack of that feature is probably a bigger problem than any of this stuff, who knows...). Another beginner thought is that the DSA allocator is a source of contention during loading: the dumbest problem is that the chunks are just too small, but it might also be interesting to look into per-node pools. Or something. IDK, just some thoughts...