On 3/2/24 23:11, Melanie Plageman wrote: > On Fri, Mar 1, 2024 at 2:31 PM Melanie Plageman > <melanieplage...@gmail.com> wrote: >> >> ... >> >> Hold the phone on this one. I realized why I moved >> BitmapAdjustPrefetchIterator after table_scan_bitmap_next_block() in >> the first place -- master calls BitmapAdjustPrefetchIterator after the >> tbm_iterate() for the current block -- otherwise with eic = 1, it >> considers the prefetch iterator behind the current block iterator. I'm >> going to go through and figure out what order this must be done in and >> fix it. > > So, I investigated this further, and, as far as I can tell, for > parallel bitmapheapscan the timing around when workers decrement > prefetch_pages causes the performance differences with patch 0010 > applied. It makes very little sense to me, but some of the queries I > borrowed from your regression examples are up to 30% slower when this > code from BitmapAdjustPrefetchIterator() is after > table_scan_bitmap_next_block() instead of before it. > > SpinLockAcquire(&pstate->mutex); > if (pstate->prefetch_pages > 0) > pstate->prefetch_pages--; > SpinLockRelease(&pstate->mutex); > > I did some stracing and did see much more time spent in futex/wait > with this code after the call to table_scan_bitmap_next_block() vs > before it. (table_scan_bitmap_next_block()) calls ReadBuffer()). > > In my branch, I've now moved only the parallel prefetch_pages-- code > to before table_scan_bitmap_next_block(). > https://github.com/melanieplageman/postgres/tree/bhs_pgsr > I'd be interested to know if you see the regressions go away with 0010 > applied (commit message "Make table_scan_bitmap_next_block() async > friendly" and sha bfdcbfee7be8e2c461). >
I'll give this a try once the runs with MAX_BUFFERS_PER_TRANSFER=1 complete. But it seems really bizarre that simply moving this code a little bit would cause such a regression ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company