Hi, While cleaning up some unnecessary bits of the code and slightly inaccurate comments, I ran into a failure when the parallel scan (used by the parallel build) happened to be a synchronized scan. When the scan did not start on page 0, the parallel callback failed to correctly handle tuples after wrapping around to the start of the table.
AFAICS the extensive testing I did during development did not detect this because strictly speaking the index was "correct" (as in not returning incorrect results in queries), just less efficient (missing some ranges, and some ranges being "wider" than necessary). Or perhaps the tests happened to not trigger synchronized scans. Should be fixed by 1ccab5038eaf261f. It took me ages to realize what the problem is, and I initially suspected there's some missing coordination between the workers/leader, or something. So I started comparing the code to btree, which is where it originated, and I realized there's indeed one difference - the BRIN code only does half the work with the workersdonecv variable. The workers do correctly update the count and notify the leader, but the leader never waits for the count to be 0. That is, there's nothing like _bt_parallel_heapscan. I wonder whether this actually is a problem, considering the differences between the flow in BRIN and BTREE. In particular, the "leader" does the work in _brin_end_parallel() after WaitForParallelWorkersToFinish(). So it's not like there might be a worker still processing data, I think. But now that I think about it, maybe it's not such a great idea to do this kind of work in _brin_end_parallel(). Maybe it should do just stuff related to termination of workers etc. and the merging of results should happen elsewhere - earlier in brinbuild()? Then it'd make sense to have something like _bt_parallel_heapscan ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company