On Wed, May 5, 2021 at 3:46 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > Admittedly, it seems unlikely that the difference could exceed > MAX_PARALLEL_WORKER_LIMIT = 1024 in a regression test run where > the limit on number of parallel workers is only 8. What I think is > more likely, given that these counters are unsigned, is that the > difference was actually negative. Which could be a bug, or it could > be an expectable race condition, or it could just be some flakiness > on lorikeet's part (that machine has had a lot of issues lately).
I think that assertion was added by me, and I think the thought process was that the value shouldn't go negative and that if it does it's probably a bug which we might want to fix. But since the values are unsigned I could hardly check for < 0, so I did it this way instead. But since there's no memory barrier between the two loads, I guess there's no guarantee that they have the expected relationship, even if there is a memory barrier on the store side. I wonder if it's worth trying to tighten that up so that the assertion is more meaningful, or just give up and rip it out. I'm afraid that if we do have (or develop) bugs in this area, someone will discover that the effective max_parallel_workers value on their system slowly drifts up or down from the configured value, and we'll have no clue where things are going wrong. The assertion was intended to give us a chance of noticing that sort of problem in the buildfarm or on a developer's machine before the code gets out into the real world. -- Robert Haas EDB: http://www.enterprisedb.com