> On 9 May 2018, at 17:51, Robert Haas <robertmh...@gmail.com> wrote: > > Ouch. That's not so bad at READ COMMITTED, but at higher isolation > levels failure becomes extremely likely. Any multi-statement > transaction that lasts longer than global_snapshot_defer_time is > pretty much doomed.
Ouch indeed. Current xmin holding scheme has two major drawbacks: it introduces timeout between export/import of snapshot and holds xmin in pessimistic way, so old versions will be preserved even when there were no global transactions. On a positive side is simplicity: that is the only way which I can think of that doesn't require distributed calculation of global xmin, which in turn, will probably require permanent connection to remote postgres_fdw node. It is not hard to add some background worker to postgres_fdw that will hold permanent connection, but I afraid that it is very discussion-prone topic and that's why I tried to avoid that. > I don't think holding back xmin is a very good strategy. Maybe it > won't be so bad if and when we get zheap, since only the undo log will > bloat rather than the table. But as it stands, holding back xmin > means everything bloats and you have to CLUSTER or VACUUM FULL the > table in order to fix it. Well, opened local transaction in postgres holds globalXmin for whole postgres instance (with exception of STO). Also active global transaction should hold globalXmin of participating nodes to be able to read right versions (again, with exception of STO). However, xmin holding scheme itself can be different. For example we can periodically check (lets say every 1-2 seconds) oldest GlobalCSN on each node and delay globalXmin advancement only if there is really exist some long transaction. So the period of bloat will be limited by this 1-2 seconds, and will not impose timeout between export/import. Also, I want to note, that global_snapshot_defer_time values about of tens of seconds doesn't change much in terms of bloat comparing to logical replication. Active logical slot holds globalXmin by setting replication_slot_xmin, which is advanced on every RunningXacts, which in turn logged every 15 seconds (hardcoded in LOG_SNAPSHOT_INTERVAL_MS). > If the behavior were really analogous to our existing "snapshot too > old" feature, it wouldn't be so bad. Old snapshots continue to work > without error so long as they only read unmodified data, and only > error out if they hit modified pages. That is actually a good idea that I missed, thanks. Really since all logic for checking modified pages is already present, it is possible to reuse that and don't raise "Global STO" error right when old snapshot is imported, but only in case when global transaction read modified page. I will implement that and update patch set. Summarising, I think, that introducing some permanent connections to postgres_fdw node will put too much burden on this patch set and that it will be possible to address that later (in a long run such connection will be anyway needed at least for a deadlock detection). However, if you think that current behavior + STO analog isn't good enough, then I'm ready to pursue that track. -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company