Mihail Nikalayeu <[email protected]> wrote:

>Antonin Houska <[email protected]>:
>> 
>> As the test runs pgbench with --client=30 and the default value of
>> max_worker_processes is 8, I'm not sure this is a leak. I've increased this
>> parameter I couldn't see the error anymore.
> 
> Hm, as far as I remember only single repack may be executed in test (because
> of locking on test itself and also REPACK).

The only problem is that the logical decoding system needs to wait during the
setup for all the running transactions to finish. So if REPACK (CONCURRENTLY)
is already running, the next execution will not start until the first is done.

However, that does not restrict the REPACK decoding workers from starting.

>>  I agree that this is due to the missing MVCC safety feature. I commented 
>> that
>>  check in the script for now.
> 
> I don't think so. In case of non-MVCC safety we should see 0 or correct sum. 
> But script failed with 490588...
> But should see 500500 (if I correctly calculated sum of numbers from 1 to 
> 1000)...

I was referring to your statement "It may be 0 because non-MVCC
safe". Regarding the non-zero values, I think I finally understand the issue
and even could reproduce some weird behavior using debugger. Since it also
affects logical replication, I'll provide more details (and hopefully propose
a patch) in a separate thread early next week.

In short, it looks like (hopefully very) rare race condition, such that the
snapshot builder can build the initial snapshot before all the commits have
been recorded in CLOG. When that happens, visibility checks don't work
correctly.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com


Reply via email to