Tom Lane wrote: > Josh Berkus <j...@agliodbs.com> writes: >> On 2/26/10 10:53 AM, Tom Lane wrote: >>> I think that what we are going to have to do before we can ship 9.0 >>> is rip all of that stuff out and replace it with the sort of closed-loop >>> synchronization Greg Smith is pushing. It will probably be several >>> months before everyone is forced to accept that, which is why 9.0 is >>> not going to ship this year. > >> I don't think that publishing visibility info back to the master ... and >> subsequently burdening the master substantially for each additional >> slave ... are what most users want. > > I don't see a "substantial additional burden" there. What I would > imagine is needed is that the slave transmits a single number back > --- its current oldest xmin --- and the walsender process publishes > that number as its transaction xmin in its PGPROC entry on the master.
The additional burden comes from the old snapshot effect. It makes it unusable for offloading reporting queries, for example. In general, it is a very good architectural property that the master is not affected by what happens in a standby, and a closed-loop synchronization would break that. I don't actually understand how tight synchronization on its own would solve the problem. What if the connection to the master is lost? Do you kill all queries in the standby before reconnecting? One way to think about this is to first consider a simple a stop-and-go system. Clearly the database must be consistent at any point in the WAL sequence, if recovery was stopped and the database started up. So it is always safe to pause recovery and run a read-only query against the database as it is at that point in time (this assumes that the index "cleanup" operations are not required for consistent query results BTW). After the read-only transaction is finished, you can continue recovery. The next step up is to relax that so that you allow replay of those WAL records that are known to not cause trouble to the read-only queries. For example, heap_insert records are very innocent, they only add rows with a yet-uncommitted xmin. Things get more complex when you allow the replay of commit records; all the known-assigned-xids tracking is related to that, so that transactions that are not committed when a snapshot is taken in the standby to be considered uncommitted by the snapshot even after the commit record is later replayed. If that feels too fragile, there might be other methods to achieve that. One I once pondered is to not track all in-progress transactions in shared memory like we do now, but only OldestXmin. When a backend wants to take a snapshot in the slave, it memcpy()s clog from OldestXmin to the latest committed XID, and includes it in the snapshot. The visibility checks use the copy instead of the actual clog, so they see the situation as it was when the snapshot was taken. To keep track of the OldestXmin in the slave, the master can emit that as a WAL record every now and then; it's ok if it lags behind. Then there's the WAL record types that remove data that might still be required by the read-only transactions. This includes vacuum and index deletion records. If you really think the current approach is unworkable, I'd suggest that we fall back to a stop-and-go system, where you either let the recovery to progress or allow queries to run, but not both at the same time. But FWIW I don't think the situation is that grave. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers