Simon, Fujii, What follows are what I see as the major issues with making two-server synch replication work well. I would like to have you each answer them, explaining how your patch and your design addresses each issue. I believe this will go a long way towards helping the majority of the community understand the options we have from your code, as well as where help is still needed.
Adding a Synch Standby ----------------------- What is the procedure for adding a new synchronous standby in your implementation? That is, how do we go from having a standby server with an empty PGDATA to having a working synchronous standby? Snapshot Publication --------------------- During 9.0 development discussion, one of the things we realized we needed for synch standby was publication of snapshots back to the master in order to prevent query cancel on the standby. Without this, the synch standby is useless for running read queries. Does your patch implement this? Please describe. Management ----------- One of the serious flaws currently in HS/SR is complexity of administration. Setting up and configuring even a single master and single standby requires editing up to 6 configuration files in Postgres, as well as dealing with file permissions. As such, any Synch Rep patch must work together with attempts to simplify administration. How does your design do this? Monitoring ----------- Synch rep offers severe penalties to availability if a synch standby gets behind or goes down. What replication-specific monitoring tools and hooks are available to allow administators to take action before the database becomes unavailable? Degradation ------------ In the event that the synch rep standby falls too far behind or becomes unavailable, or is deliberately taken offline, what are you envisioning as the process for the DBA resolving the situation? Is there any ability to commit "stuck" transactions? Client Consistency --------------------- With a standby in "apply" mode, and a master failure at the wrong time, there is the possibility that the Standby will apply a transaction at the same time that the master crashes, causing the client to never receive a commit message. Once the client reconnects to the standby, how will it know whether its transaction was committed or not? As a lesser case, a standby in "apply" mode will show the results of committed transactions *before* they are visible on the master. Is there any need to handle this? If so, how? Performance ------------ As with XA, synch rep has the potential to be so slow as to be unusable. What optimizations to you make in your approach to synch rep to make it faster than two-phase commit? What other performance optimizations have you added? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers