On 2017-05-01 11:09:44 +0200, Petr Jelinek wrote: > On 01/05/17 10:03, Andres Freund wrote: > > On 2017-05-01 03:54:49 +0200, Petr Jelinek wrote:
> >> But, I still think we need to restart the tracking after new > >> xl_running_xacts. Reason for that is afaics any of the catalog snapshots > >> that we assigned to transactions at the end of SnapBuildCommitTxn might > >> be corrupted otherwise as they were built before we knew one of the > >> supposedly running txes was actually already committed and that > >> transaction might have done catalog changes. > > > > I'm afraid you're right. But I think this is even more complicated: The > > argument in your version that this can only happen once, seems to also > > be holey: Just imagine a pg_usleep(3000 * 1000000) right before > > ProcArrayEndTransaction() and enjoy the picture. > Well yes, transaction can in theory have written commit/abort xlog > record and stayed in proc for more than single xl_running_xacts write. > But then the condition which we test that the new xl_running_xacts has > bigger xmin than the previously tracked one's xmax would not be > satisfied and we would not enter the relevant code path yet. So I think > we should not be able to get any xids we didn't see. But we have to > restart tracking from beginning (after first checking if we didn't > already see anything that the xl_running_xacts considers as running), > that's what my code did. But to get that correct, we'd have to not only track ->committed, but also somehow maintain ->aborted, and not just for the transactions in the original set of running transactions. That'd be fairly complicated and large. The reason I was trying - and it's definitely not correct as I had proposed - to use the original running_xacts record is that that only required tracking as many transaction statuses as in the first xl_running_xacts. Am I missing something? The probabilistic tests catch the issues here fairly quickly, btw, if you run with synchronous_commit=on, while pgbench is running, because the WAL flushes make this more likely. Runs this query: SELECT account_count, teller_count, account_sum - teller_sum s FROM ( SELECT count(*) account_count, SUM(abalance) account_sum FROM pgbench_accounts ) a, ( SELECT count(*) teller_count, SUM(tbalance) teller_sum FROM pgbench_tellers ) t which, for my scale, should always return: ┌─────────┬─────┬───┐ │ a │ t │ s │ ├─────────┼─────┼───┤ │ 2000000 │ 200 │ 0 │ └─────────┴─────┴───┘ but with my POC patch occasionally returns things like: ┌─────────┬─────┬───────┐ │ a │ t │ s │ ├─────────┼─────┼───────┤ │ 2000000 │ 212 │ 37358 │ └─────────┴─────┴───────┘ which obviously shouldn't be the case. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers