On Mon, Oct 11, 2010 at 2:07 PM, Josh Berkus <j...@agliodbs.com> wrote: >> I'll take a crack at answering these. I don't think that the >> procedure for setting up a standby server is going to change much. >> The idea is presumably that you set up an async standby more or less >> as you do now and then make whatever configuration changes are >> necessary to flip it to synchronous. > > What is the specific "flip" procedure, though? For one thing, I want to > make sure that it's not necessary to restart the master or the standby > to "flip" it, since that would be a catch-22.
Obviously. I presume it'll be something like "update postgresql.conf or recovery.conf and run pg_ctl reload", but I haven't (yet, anyway) verified the actual behavior of the patches, but if the above isn't feasible then we have a problem. >> This is a completely separate issue from making replication >> synchronous. And, really? Useless for running read queries? > > Absolutely. For a synch standby, you can't tolerate any standby delay > at all. This means that anywhere from 1/4 to 3/4 of queries on the > standby would be cancelled on any high-traffic OLTP server. Hence, > "useless". What is your source for those numbers? They could be right, but I simply don't know. At any rate, I don't disagree that we have a problem. In fact, I think we have a whole serious of problems. The whole architecture of replication as it exists in PG is pretty fundamentally limited right now. Right now, a pruning operation on the master (regardless of whether it's a HOT prune or vacuum) can happen when there are still snapshots on the slave that need that data. Our only options are to either wait for those snapshots to go away, or kill of the queries/transactions that took them. Adding an XID feedback from the slave to the master "fixes" the problem by preventing the master from pruning those tuples until the slave no longer needs them, but at the expense of bloating the master and all other standbys. That may, indeed, be better for some use cases, but it's not really all that good. It would be far better if we could decouple master cleanup from standby cleanup, so that only the machine that actually has the old query gets bloated. However, no one seems excited about writing that code. A further grump about our current architecture is that it doesn't seem at all clear how to make it work for partial replication. I have to wonder whether we are going down the wrong path completely and need to hit the reset button. But neither this nor the pruning problem are things that we can reasonably expect the sync rep patch to solve, if we want it to get committed this release cycle. >>> As such, any Synch Rep patch >>> must work together with attempts to simplify administration. How does >>> your design do this? >> >> This is also completely out of scope for sync rep. > > It is not, given that I've seen several proposals for synch rep which > would make asynch rep even more complicated than it already is. I'm not aware of any proposals on the table which would do that. > I'm > taking the stance that any sync rep design which *blocks* making asynch > rep easier to use is fundamentally flawed and can't be accepted. Do you have some ideas on how to simplify it? How will we know whether a particular design for sync rep does this? >> I don't think there's much hope of allowing administrators to take >> action BEFORE the database becomes unavailable. > > I'd swear that you were working as a DBA less than a year ago, but I > couldn't tell it from that statement. Your comment sounded to me like you were asking for a schedule of all future unplanned outages. > There is every bit of value in allowing DBAs to view, and chart, > response times on the standby for ACK. That way they can notice an > increase in response times and take action to improve the standby > *before* it locks up the system. Sure, that would be nice to have, and it's a good idea. But I don't think that's going to be a common failure mode. What I expect to happen is the standby to hum along with no problem for a long time and then either kick a disk or suffer a power outage. There's very little monitoring we can do within PG that will notice either of those things coming. There might be some external-to-PG monitoring that can be done, but if there's a massive blackout or a terrorist attack or somebody trips over the power cord, you're just going to get surprised. >> Presumably, if >> synchronous replication is disabled via (1) or (2) above, then any >> outstanding committed-but-unacknowledged-to-the-client transactions >> should notify the client of the commit and continue on. > > That's what I was asking about. I'm not "presuming" that any pending > patch covers any such eventuality until it's confirmed. Yep, we need to confirm that. >> If a client loses the connection after issuing a commit but before >> receiving the acknowledgment, it can't know whether the commit >> happened or not. This is true regardless of whether there is a >> standby and regardless of whether that standby is synchronous. >> Clients that care need to implement their own mechanisms for resolving >> this difficulty. > > That's a handwavy way of saying "go away, don't bother us with such > details". For the client to resolve the situation, then *it* needs to > be able to tell whether or not the transaction was committed. How would > it do this, exactly? No, it isn't at all. What does your application do NOW if the master goes down after you've sent a commit and before you get an acknowledgment back? Does it assume that the transaction is committed, or does it assume the transaction was aborted by a crash on the master? Either is possible, right? >> It's theoretically impossible for the transaction to become visible >> everywhere simultaneously. It's already the case that transactions >> become visible to other backends before the backend doing the commit >> has received an acknowledgment. Any client relying on any other >> behavior is already broken. > > So, your opinion is "it's out of scope to handle this issue" ? What handling of it would you propose? Consider the case where you just have one server and no standbys. A client connects, does some work, and says COMMIT. There is some finite amount of time after the COMMIT happens and before the client gets the acknowledgment back that the commit has succeeded. During that time, another transaction that starts up will see the effects of the COMMIT - BEFORE the transaction itself knows that it is committed. There's not much you can do about this. You have to do the commit on the server before sending the response back to the client. In the sync rep case, you're going to get the same behavior. After the client has asked for commit and before the commit has been acknowledged, there's no guarantee whether another transaction that starts up during that in-between time sees the transaction or not. The only further anomaly that can happen as a result of sync rep is that, in apply mode, the transaction's effects will become visible on the standby before they are visible on the master, so if you fire off a COMMIT, and then before receiving the acknowledgment start a transaction on the standby, and then just after that start a transaction on the master, and then just after that you get back an acknowledgment that the COMMIT completed, you might have a snapshot on the master that was taken afterwards chronologically but shows the effects of fewer committed XIDs - i.e. time has gone backwards. Unfortunately, short of a global transaction manager, this is an unsolvable problem, and that's definitely more than is going to happen for 9.1, I think. >> Sync rep is going to be slow, period. Every implementation currently >> on the table has to fsync on the master, and then send the commit xlog >> record to the slave and wait for an acknowledgment from the slave. >> Allowing those to happen in parallel is going to be Hard. > > Yes, but it's something we need to address. I agree, but it's not something we can address in the first patch, which is hard enough without adding things that make it even harder. We need to get something simple committed first and then build on it. > XA is widely distrusted and > is seen as inadequate for high-traffic OLTP systems precisely because it > is SO slow. If we want to create a synch rep system which people will > want to use, then it has to be faster than XA. If it's not faster than > XA, why bother creating it? We already have 2PC. I don't know anything about XA so I can't comment on this. >> Also, the >> interaction with max_standby_delay is going to be a big problem, I >> suspect. > > Interaction? My opinion is that the two are completely incompatible. > You can't have synch rep and also have standby_delay > 0. We seem to be in violent agreement on this point. I was saying the same thing in a different way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers